Skip to content

Conversation

@jackhorton
Copy link
Contributor

Third attempt at this after #4984 and #5001, but I have actually confirmed that this makes Node-ChakraCore's chakracore-master branch build, launch, and run Intl code successfully once more. This definitely isn't the best solution, however if we refer to https://github.com/nodejs/node-chakracore/blob/d94b22785fb6ab7dde77cb13b7e95e958e581375/src/node_i18n.cc#L568-L581, we can see that Node somewhat explicitly also does not call u_init if they call udata_setCommonData.

@jackhorton
Copy link
Contributor Author

@jefgen the docs say u_init can be called multiple times with no effect, however @srl295 mentions in nodejs/node#13053 that this is not the case. If u_init can't be guaranteed to no-op on the nth call for n > 1, we may need to look into cases where Intl can get initialized twice, such as across different iframes or node-land vm.runInContext

@jefgen
Copy link
Collaborator

jefgen commented Apr 18, 2018

As you noted in the comments in the code change, we don't expose the API udata_setCommonData in the in-box/SDK version of ICU, so you can't get into a situation where an app might have called that before u_init is called.
Though it's not clear to me from reading the comment thread with @srl295 (linked above) if he is saying that "ICU only initializes once" [internally], or if he is saying "you should only call u_init once". (I think it is the former..?)
If ICU data is already initialized (via udata_setCommonData) then I would have expected u_init to be a no-op.
Are you getting back U_FILE_ACCESS_ERROR from the u_init call if udata_setCommonData is called first?

Edit:
Re-reading the comment from @srl295, this sentence is probably the most relevant:
"The failure from u_init doesn't mean it wasn't initialized, technically, but that when it initialized, it couldn't find even a minimal data file."

If node doesn't ship with any data file (and only provides it via the udata_setCommonData function), then this error would make more sense.

@jackhorton
Copy link
Contributor Author

That is correct, we were getting back U_FILE_ACCESS_ERROR in node when the small data was used and set with udata_setCommonData. There is no particularly good way to plumb the information about if node has decided to use small data through when compiling and running ChakraCore, so I think I am happy with just keeping this code specific to windows kit ICU.

@srl295
Copy link

srl295 commented Apr 18, 2018

@jackhorton I said:

I don't know if ICU supports initializing twice

No, it doesn't support that.

The question was whether you could call u_init after changing the load path, etc to re initialize with different results. The answer to that is no. Your question is different as @jefgen noted.

u_init can't be guaranteed to no-op on the nth call for n > 1

u_init() is guaranteed to be a no-op on the nth call for n>1. (At least until/unless u_cleanup() is called.)

Hope this helps.

@jackhorton
Copy link
Contributor Author

Ah, I understand -- sorry for the confusion. That makes things quite a bit easier on our end. Thanks!

@jefgen
Copy link
Collaborator

jefgen commented Apr 18, 2018

Does the "small-icu" data for Node include any converters (code-pages)? If not then then U_FILE_ACCESS_ERROR returned from u_init might make sense as it is essentially trying to load the list of converters.

@jefgen
Copy link
Collaborator

jefgen commented Apr 18, 2018

Thanks for the reply @srl295 ! 👍

// to Windows 10 ICU because we know udata_setCommonData will not be called before then (since it is not allowed). This
// change primarily exists to guard against OOMs in ICU being treated as unique bugs by Crawler, so it really only *needs*
// to be enabled when we are using Windows 10 ICU, anyways.
// TODO(jahorto): investigate why u_init can't be called after udata_setCommonData
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this. u_init should be called after udata_setCommonData (that's the design and what Node.js does). The U_FILE_ACCESS_ERROR may be something wrong with the data/configuration. Try compiling with -DUDATA_DEBUG=1 and make sure your terminal's scrollback buffer is cranked way up…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am missing something, I didn't see where node called u_init in the same path as calling udata_setCommonData. so I assumed it was intentional.

@jefgen
Copy link
Collaborator

jefgen commented Apr 18, 2018

Note: I still think that true "out-of-memory" errors when loading data should be reported as such, rather than being converted into other errors like U_MISSING_RESOURCE_ERROR

FYI: I filed a ICU ticket for this: http://bugs.icu-project.org/trac/ticket/13712

@srl295
Copy link

srl295 commented Apr 18, 2018

ACtually u_init docs say:

Multiple calls to u_init() cause no harm, aside from the small amount of time required.

Does the "small-icu" data for Node include any converters (code-pages)? If not then then U_FILE_ACCESS_ERROR returned from u_init might make sense as it is essentially trying to load the list of converters.

Yes. That's a good point though. u_init's return code is based on assumptions about what data ought to be available. A FILE ACCESS ERROR may just mean the data is modified from stock ICU, not that anything necessarily went wrong with ICU's initialization. This probably should be documented.

Could you try your test but with a full ICU dataset? Definitely if you have a 'stubdata dataset' (the 0-length data) you might get a file access error.

@jackhorton
Copy link
Contributor Author

So Node-ChakraCore with full-icu works properly without this change. A step further, this issue is definitely with calling u_init on the data file in small-icu -- this is with the released version of node-v8 8.11.1.

~/dev/node-cc2 [v8.x] $ node --version
v8.11.1
~/dev/node-cc2 [v8.x] $ node -pe process.versions
{ http_parser: '2.8.0',
  node: '8.11.1',
  v8: '6.2.414.50',
  uv: '1.19.1',
  zlib: '1.2.11',
  ares: '1.10.1-DEV',
  modules: '57',
  nghttp2: '1.25.0',
  openssl: '1.0.2o',
  icu: '60.1',
  unicode: '10.0',
  cldr: '32.0',
  tz: '2017c' }
~/dev/node-cc2 [v8.x] $ node --icu-data-dir=./deps/icu-small/source/data/in
node: could not initialize ICU (check NODE_ICU_DATA or --icu-data-dir parameters)
~/dev/node-cc2 [v8.x] $ node --icu-data-dir=./deps/icu/source/data/in
> new Intl.DateTimeFormat().format()
'4/18/2018'

@srl295
Copy link

srl295 commented Apr 18, 2018

So Node-ChakraCore with full-icu works properly without this change. A step further, this issue is definitely with calling u_init on the data file in small-icu

Then, I think the error result from u_init in the small-icu case is misleading for your purposes. (the u_init API docs kind of cover this, but as I said they could be more explicit for the 'customized data' case.)

A better test might be ensuring that the error is something like (U_SUCCESS(status) || (status== U_FILE_ACCESS_ERROR)). Or even calling ulocdata_getCLDRVersion()

@jackhorton
Copy link
Contributor Author

We explicitly fail for U_FILE_ACCESS_ERROR because that is what is reported when calling u_init for Windows Kit ICU in a low-memory scenario, because it can't mmap the data file. So, we don't want to ignore that error in this specific use case, possibly unless we are in node, but as I mentioned above, we don't plumb any information through about whether chakracore is building or running in the context of node right now. I'd rather ignore all u_init errors for non-Windows Kit ICU because they will be caught by later status code checks anyways, so it doesn't put the engine in a bad state.

With that being said, does ulocdata_getCLDRVersion actually load the data file? In other words, could we use it instead of u_init for this specific use case of ensuring that the data is in memory before continuing.

@srl295
Copy link

srl295 commented Apr 18, 2018

Yes ulocdata_getCLDRVersion actually loads the data file. So if it returns a CLDR version you're good (should work in small icu).

@jackhorton
Copy link
Contributor Author

Awesome. I will test that out and update this PR if it works for all of these scenarios. Thanks!

@jefgen
Copy link
Collaborator

jefgen commented Apr 19, 2018

Thank you for the idea @srl295! The CLDR version should hopefully always be there, even if there are no converters at all, so this should ideally work even with a super-small data file.

Copy link
Collaborator

@jefgen jefgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ulocdata_getCLDRVersion should work everywhere. :)

INTL_TRACE("Using CLDR version %d.%d.%d.%d", cldrVersion[0], cldrVersion[1], cldrVersion[2], cldrVersion[3]);
}

AssertOrFailFastMsg(U_SUCCESS(status), "u_init returned non-OOM failure");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This still says u_init. :)

Copy link
Contributor

@kfarnung kfarnung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified in node-chakracore, LGTM.

@jackhorton jackhorton force-pushed the icu/init-windows-kit branch 2 times, most recently from 9e99307 to 8193048 Compare April 22, 2018 01:04
@jackhorton jackhorton force-pushed the icu/init-windows-kit branch from 8193048 to f99c276 Compare April 22, 2018 01:05
@chakrabot chakrabot merged commit f99c276 into chakra-core:master Apr 22, 2018
chakrabot pushed a commit that referenced this pull request Apr 22, 2018
…Kit ICU

Merge pull request #5011 from jackhorton:icu/init-windows-kit

Third attempt at this after #4984 and #5001, but I have actually confirmed that this makes Node-ChakraCore's chakracore-master branch build, launch, and run Intl code successfully once more. This definitely isn't the best solution, however if we refer to https://github.com/nodejs/node-chakracore/blob/d94b22785fb6ab7dde77cb13b7e95e958e581375/src/node_i18n.cc#L568-L581, we can see that Node somewhat explicitly also does not call u_init if they call udata_setCommonData.
@jackhorton jackhorton deleted the icu/init-windows-kit branch April 30, 2018 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants