[M]Fix issue#1436 #1437

jmecn · 2020-11-27T08:51:06Z

Fix the code error when checking UTF-8 data

let content = [0xE4, 0x8A, 0xBC], let b = 0xE4 (1110 0100)

See this part:

                if (b < 0x80) {
                    // good
                }
                else if ((b & 0xC0) == 0xC0) {//   (0xE4 & 0xC0) == 0xC0     =====>  true
                    utf8State = UTF8_2BYTE;
                }
                else if ((b & 0xE0) == 0xE0) {//   (0xE4 & 0xE0) == 0xE0      =====>  true
                    utf8State = UTF8_3BYTE_1;
                }
                else {
                    utf8State = UTF8_ILLEGAL;
                }

3 bytes UTF-8 data while always be treated as 2 bytes UTF-8 data.

stephengold · 2020-11-28T17:09:41Z

Thank you for providing the fix. Please change "is" to "are" in 3 places. Other than that, this looks great.

jmecn · 2020-11-30T08:59:05Z

It looks better that always treat String data as UTF-8.
write it UTF-8, read it UTF-8

see https://hub.jmonkeyengine.org/t/code-error-on-checking-utf-8-data/43909

Let a 3 bytes UTF-8 data = [0xE4, 0x8A, 0xBC], when b = 0xE4 (1110 0100), it will be treated as 2 bytes. See this part: ```java if (b < 0x80) { // good } else if ((b & 0xC0) == 0xC0) {// (0xE4 & 0xC0) == 0xC0 =====> true utf8State = UTF8_2BYTE; } else if ((b & 0xE0) == 0xE0) {// (0xE4 & 0xE0) == 0xE0 =====> true utf8State = UTF8_3BYTE_1; } else { utf8State = UTF8_ILLEGAL; } ``` 3 bytes UTF-8 data while always be treated as 2 bytes UTF-8 data. It's better that always treat String data as UTF-8 now. see https://hub.jmonkeyengine.org/t/code-error-on-checking-utf-8-data/43909

riccardobl · 2020-12-02T13:32:07Z

Can you use StandardCharsets.UTF_8 for the charset?

jmecn · 2020-12-03T02:12:30Z

Can you use StandardCharsets.UTF_8 for the charset?

OK, but it's better to use it in both input and output.

https://github.com/jMonkeyEngine/jmonkeyengine/blob/2196e4c/jme3-core/src/plugins/java/com/jme3/export/binary/BinaryOutputCapsule.java#L688

riccardobl

Perfect.

stephengold added this to the Future Release milestone Nov 29, 2020

stephengold mentioned this pull request Nov 29, 2020

Code error on checking UTF-8 data #1436

Closed

riccardobl self-requested a review December 2, 2020 13:33

[M]Use StandardCharsets.UTF_8 instead of constant 'UTF8'

7cee45a

riccardobl approved these changes Dec 3, 2020

View reviewed changes

stephengold merged commit d4a7ad7 into jMonkeyEngine:master Dec 5, 2020

stephengold modified the milestones: Future Release, v3.4.0 Mar 13, 2021

stephengold linked an issue Mar 16, 2021 that may be closed by this pull request

Code error on checking UTF-8 data #1436

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[M]Fix issue#1436 #1437

[M]Fix issue#1436 #1437

Uh oh!

jmecn commented Nov 27, 2020

Uh oh!

stephengold commented Nov 28, 2020 •

edited

Loading

Uh oh!

jmecn commented Nov 30, 2020

Uh oh!

riccardobl commented Dec 2, 2020

Uh oh!

jmecn commented Dec 3, 2020

Uh oh!

riccardobl left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[M]Fix issue#1436 #1437

[M]Fix issue#1436 #1437

Uh oh!

Conversation

jmecn commented Nov 27, 2020

Uh oh!

stephengold commented Nov 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmecn commented Nov 30, 2020

Uh oh!

riccardobl commented Dec 2, 2020

Uh oh!

jmecn commented Dec 3, 2020

Uh oh!

riccardobl left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stephengold commented Nov 28, 2020 •

edited

Loading