-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Only pre-initialize ICU when using Windows Kit ICU #5011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jefgen the docs say u_init can be called multiple times with no effect, however @srl295 mentions in nodejs/node#13053 that this is not the case. If u_init can't be guaranteed to no-op on the nth call for n > 1, we may need to look into cases where Intl can get initialized twice, such as across different iframes or node-land |
|
As you noted in the comments in the code change, we don't expose the API Edit: If node doesn't ship with any data file (and only provides it via the udata_setCommonData function), then this error would make more sense. |
|
That is correct, we were getting back U_FILE_ACCESS_ERROR in node when the small data was used and set with udata_setCommonData. There is no particularly good way to plumb the information about if node has decided to use small data through when compiling and running ChakraCore, so I think I am happy with just keeping this code specific to windows kit ICU. |
|
@jackhorton I said:
The question was whether you could call
Hope this helps. |
|
Ah, I understand -- sorry for the confusion. That makes things quite a bit easier on our end. Thanks! |
|
Does the "small-icu" data for Node include any converters (code-pages)? If not then then U_FILE_ACCESS_ERROR returned from u_init might make sense as it is essentially trying to load the list of converters. |
|
Thanks for the reply @srl295 ! 👍 |
| // to Windows 10 ICU because we know udata_setCommonData will not be called before then (since it is not allowed). This | ||
| // change primarily exists to guard against OOMs in ICU being treated as unique bugs by Crawler, so it really only *needs* | ||
| // to be enabled when we are using Windows 10 ICU, anyways. | ||
| // TODO(jahorto): investigate why u_init can't be called after udata_setCommonData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this. u_init should be called after udata_setCommonData (that's the design and what Node.js does). The U_FILE_ACCESS_ERROR may be something wrong with the data/configuration. Try compiling with -DUDATA_DEBUG=1 and make sure your terminal's scrollback buffer is cranked way up…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I am missing something, I didn't see where node called u_init in the same path as calling udata_setCommonData. so I assumed it was intentional.
|
Note: I still think that true "out-of-memory" errors when loading data should be reported as such, rather than being converted into other errors like FYI: I filed a ICU ticket for this: http://bugs.icu-project.org/trac/ticket/13712 |
|
ACtually u_init docs say:
Yes. That's a good point though. Could you try your test but with a full ICU dataset? Definitely if you have a 'stubdata dataset' (the 0-length data) you might get a file access error. |
|
So Node-ChakraCore with full-icu works properly without this change. A step further, this issue is definitely with calling u_init on the data file in small-icu -- this is with the released version of node-v8 8.11.1. ~/dev/node-cc2 [v8.x] $ node --version
v8.11.1
~/dev/node-cc2 [v8.x] $ node -pe process.versions
{ http_parser: '2.8.0',
node: '8.11.1',
v8: '6.2.414.50',
uv: '1.19.1',
zlib: '1.2.11',
ares: '1.10.1-DEV',
modules: '57',
nghttp2: '1.25.0',
openssl: '1.0.2o',
icu: '60.1',
unicode: '10.0',
cldr: '32.0',
tz: '2017c' }
~/dev/node-cc2 [v8.x] $ node --icu-data-dir=./deps/icu-small/source/data/in
node: could not initialize ICU (check NODE_ICU_DATA or --icu-data-dir parameters)
~/dev/node-cc2 [v8.x] $ node --icu-data-dir=./deps/icu/source/data/in
> new Intl.DateTimeFormat().format()
'4/18/2018' |
Then, I think the error result from u_init in the small-icu case is misleading for your purposes. (the u_init API docs kind of cover this, but as I said they could be more explicit for the 'customized data' case.) A better test might be ensuring that the error is something like |
|
We explicitly fail for U_FILE_ACCESS_ERROR because that is what is reported when calling u_init for Windows Kit ICU in a low-memory scenario, because it can't mmap the data file. So, we don't want to ignore that error in this specific use case, possibly unless we are in node, but as I mentioned above, we don't plumb any information through about whether chakracore is building or running in the context of node right now. I'd rather ignore all u_init errors for non-Windows Kit ICU because they will be caught by later status code checks anyways, so it doesn't put the engine in a bad state. With that being said, does ulocdata_getCLDRVersion actually load the data file? In other words, could we use it instead of u_init for this specific use case of ensuring that the data is in memory before continuing. |
|
Yes ulocdata_getCLDRVersion actually loads the data file. So if it returns a CLDR version you're good (should work in small icu). |
|
Awesome. I will test that out and update this PR if it works for all of these scenarios. Thanks! |
|
Thank you for the idea @srl295! The CLDR version should hopefully always be there, even if there are no converters at all, so this should ideally work even with a super-small data file. |
jefgen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ulocdata_getCLDRVersion should work everywhere. :)
| INTL_TRACE("Using CLDR version %d.%d.%d.%d", cldrVersion[0], cldrVersion[1], cldrVersion[2], cldrVersion[3]); | ||
| } | ||
|
|
||
| AssertOrFailFastMsg(U_SUCCESS(status), "u_init returned non-OOM failure"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This still says u_init. :)
kfarnung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified in node-chakracore, LGTM.
9e99307 to
8193048
Compare
8193048 to
f99c276
Compare
…Kit ICU Merge pull request #5011 from jackhorton:icu/init-windows-kit Third attempt at this after #4984 and #5001, but I have actually confirmed that this makes Node-ChakraCore's chakracore-master branch build, launch, and run Intl code successfully once more. This definitely isn't the best solution, however if we refer to https://github.com/nodejs/node-chakracore/blob/d94b22785fb6ab7dde77cb13b7e95e958e581375/src/node_i18n.cc#L568-L581, we can see that Node somewhat explicitly also does not call u_init if they call udata_setCommonData.
Third attempt at this after #4984 and #5001, but I have actually confirmed that this makes Node-ChakraCore's chakracore-master branch build, launch, and run Intl code successfully once more. This definitely isn't the best solution, however if we refer to https://github.com/nodejs/node-chakracore/blob/d94b22785fb6ab7dde77cb13b7e95e958e581375/src/node_i18n.cc#L568-L581, we can see that Node somewhat explicitly also does not call u_init if they call udata_setCommonData.