Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions doc/api/string_decoder.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,16 @@ substitution characters appropriate for the character encoding.
If the `buffer` argument is provided, one final call to `stringDecoder.write()`
is performed before returning the remaining input.

### stringDecoder.reset([encoding])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is an implementation detail users don't have to know about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is, there is no way for the users to reset the string decoder, right? They have to create a new instance if needed. This will enable reusability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new way to reset the state each time was not requested as far as I read the issue. So I also prefer not to expose the reset function. If I am correct the following should work.

const { StringDecoder } = require('string_decoder');
const decoder = new StringDecoder('utf8');

decoder.write(Buffer.from([0xE2, 0x82])); // => ''
decoder.end(); // => '�'
decoder.write(Buffer.of(0x61)); // => 'a'

<!-- YAML
added: REPLACEME
-->

* `encoding` {string} The character encoding the `StringDecoder` will use.
Defaults to the current `encoding` of the decoder.

Flushes all the internal data and the decoder will behave like a new object.

### stringDecoder.write(buffer)
<!-- YAML
added: v0.1.99
Expand All @@ -82,3 +92,4 @@ Returns a decoded string, ensuring that any incomplete multibyte characters at
the end of the `Buffer` are omitted from the returned string and stored in an
internal buffer for the next call to `stringDecoder.write()` or
`stringDecoder.end()`.

42 changes: 33 additions & 9 deletions lib/string_decoder.js
Original file line number Diff line number Diff line change
Expand Up @@ -41,36 +41,43 @@ function normalizeEncoding(enc) {
// characters.
exports.StringDecoder = StringDecoder;
function StringDecoder(encoding) {
if (encoding === this.encoding && encoding !== undefined) {
this.lastNeed = 0;
this.lastTotal = 0;
this.lastChar = Buffer.allocUnsafe(this._nb || 0);
return;
}

this.encoding = normalizeEncoding(encoding);
var nb;
switch (this.encoding) {
case 'utf16le':
this.text = utf16Text;
this.end = utf16End;
nb = 4;
this._nb = 4;
break;
case 'utf8':
this.fillLast = utf8FillLast;
nb = 4;
this._nb = 4;
break;
case 'base64':
this.text = base64Text;
this.end = base64End;
nb = 3;
this._nb = 3;
break;
default:
this.write = simpleWrite;
this.end = simpleEnd;
this._nb = 0;
return;
}
this.lastNeed = 0;
this.lastTotal = 0;
this.lastChar = Buffer.allocUnsafe(nb);
this.lastChar = Buffer.allocUnsafe(this._nb);
}

StringDecoder.prototype.reset = StringDecoder;

StringDecoder.prototype.write = function(buf) {
if (buf.length === 0)
return '';

var r;
var i;
if (this.lastNeed) {
Expand All @@ -87,7 +94,7 @@ StringDecoder.prototype.write = function(buf) {
return r || '';
};

StringDecoder.prototype.end = utf8End;
StringDecoder.prototype.end = end;

// Returns only complete characters in a Buffer
StringDecoder.prototype.text = utf8Text;
Expand Down Expand Up @@ -282,3 +289,20 @@ function simpleWrite(buf) {
function simpleEnd(buf) {
return (buf && buf.length ? this.write(buf) : '');
}

function end(buf) {
let result = '';

if (this.encoding === 'utf16le')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might consider using a switch here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

result = utf16End.call(this, buf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're the only ones calling these methods directly, then we should be able to avoid the overhead of .call() and just pass an instance as another argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

else if (this.encoding === 'utf8')
result = utf8End.call(this, buf);
else if (this.encoding === 'base64')
result = base64End.call(this, buf);
else
result = simpleEnd.call(this, buf);

this.reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to use this.reset(this.encoding) to avoid the switch in the constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still cannot get rid of the switch in the constructor, right?

Copy link
Contributor

@mscdex mscdex Dec 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but this.reset(this.encoding) would only hit that first if in the constructor, which might be better.


return result;
}
10 changes: 10 additions & 0 deletions test/parallel/test-string-decoder-end.js
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,13 @@ function testBuf(encoding, buf) {
assert.strictEqual(res1, res3, 'one byte at a time should match toString');
assert.strictEqual(res2, res3, 'all bytes at once should match toString');
}

{
// test to check if the write after end doesn't accumulate the data
const decoder = new SD('utf8');
const euroPart1 = Buffer.from([0xE2]);
const euroPart2 = Buffer.from([0x82, 0xAC]);
decoder.end(euroPart1);
const result = decoder.write(euroPart2);
assert.notStrictEqual(result, '€');
}