Clarify non-Unicode Korean tests #45815

staticfloat · 2022-06-25T18:43:17Z

This is the recommended locale by Arch Linux, so let's look for it as
well

staticfloat · 2022-06-25T19:23:03Z

This, combined with a rootfs image bump including the bits from JuliaCI/rootfs-images#208 will help us to silence CI warnings such as https://buildkite.com/julialang/julia-master/builds/13227#01819c05-81fc-4bd9-bb58-7f7872193d6c/560-843

vtjnash · 2022-06-26T03:53:55Z

This would sort of defeat the comment there which says not to use Unicode here

staticfloat · 2022-06-27T14:59:59Z

I was confused by that comment, because the output of our tests seems to be identical across every korean locale I can find:

julia> korloc = ["ko_KR.UTF-8", "ko_KR.EUC-KR", "ko_KR.CP949", "ko_KR.949", "Korean_Korea.949"]
       withlocales(korloc) do locale
           s = Libc.strftime(0.0)
           @info(locale, s, bytes2hex(sha256(s)))
       end
┌ Info: ko_KR.UTF-8
│   s = "1970년 01월 01일 (목) 오전 12시 00분 00초"
└   bytes2hex(sha256(s)) = "d0117d51da3a9a0953e0b504207190950c3d664688743a103c95584148f99092"
┌ Info: ko_KR.EUC-KR
│   s = "1970년 01월 01일 (목) 오전 12시 00분 00초"
└   bytes2hex(sha256(s)) = "d0117d51da3a9a0953e0b504207190950c3d664688743a103c95584148f99092"
┌ Info: ko_KR.cp949
│   s = "1970년 01월 01일 (목) 오전 12시 00분 00초"
└   bytes2hex(sha256(s)) = "d0117d51da3a9a0953e0b504207190950c3d664688743a103c95584148f99092"

This was generated with a locale.gen that I made that looks like the following:

ko_KR.EUC-KR EUC-KR
ko_KR.UTF-8 UTF-8
ko_KR.cp949 CP949

If we're wanting to test codepoint output that is explicitly not UTF-8, I think we need a different test, because these all output the exact same UTF-8.

vtjnash · 2022-06-27T17:28:49Z

yes, exact equivalence would be rather the point of the test (#27273)

Add explanation and extra tests to ensure that our non-unicode transcription works properly and outputs a reasonable UTF-8 string. Note that on musl, `setlocales()` never fails, and so we cannot test this properly.

staticfloat · 2022-06-29T17:40:56Z

Alright, I've switched this PR to instead clarify some things and adds tests (where appropriate) to ensure that we don't just get en_US.UTF-8 back. I've also updated the Linux testers to use the new rootfs images that include ko_KR.EUC-KR baked in. We should see the linux testers now no longer print the warning that these tests are being skipped.

staticfloat · 2022-06-29T19:02:48Z

Confirmed working. Compare the previous master build:

https://buildkite.com/julialang/julia-master/builds/13308#0181ad71-e4f7-4efe-be37-0d90aa7357c3/568-842

Versus this build:

https://buildkite.com/julialang/julia-master/builds/13325#0181b08b-f047-42c0-a3c3-500d6f325827/564-838

Windows builds also seem good.

staticfloat · 2022-06-29T19:04:28Z

@vtjnash Does this look better now? Thanks for helping me to understand the purpose behind this test. I've done my best to encode that in comments in the source as well.

staticfloat added the test This change adds or pertains to unit tests label Jun 25, 2022

staticfloat force-pushed the sf/ko_kr_utf8 branch from 5a743a8 to bded0be Compare June 28, 2022 16:58

staticfloat changed the title ~~Allow looking for ko_KR.UTF-8 locale as well~~ Clarify non-Unicode Korean tests Jun 29, 2022

Clarify non-unicode korean tests

c78a8db

Add explanation and extra tests to ensure that our non-unicode transcription works properly and outputs a reasonable UTF-8 string. Note that on musl, `setlocales()` never fails, and so we cannot test this properly.

staticfloat force-pushed the sf/ko_kr_utf8 branch from bded0be to c78a8db Compare June 29, 2022 17:38

vtjnash approved these changes Jun 30, 2022

View reviewed changes

staticfloat merged commit fec6951 into master Jun 30, 2022

staticfloat deleted the sf/ko_kr_utf8 branch June 30, 2022 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Clarify non-Unicode Korean tests #45815

Clarify non-Unicode Korean tests #45815

Uh oh!

staticfloat commented Jun 25, 2022

Uh oh!

staticfloat commented Jun 25, 2022

Uh oh!

vtjnash commented Jun 26, 2022

Uh oh!

staticfloat commented Jun 27, 2022

Uh oh!

vtjnash commented Jun 27, 2022

Uh oh!

staticfloat commented Jun 29, 2022

Uh oh!

staticfloat commented Jun 29, 2022 •

edited

Loading

Uh oh!

staticfloat commented Jun 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Clarify non-Unicode Korean tests #45815

Clarify non-Unicode Korean tests #45815

Uh oh!

Conversation

staticfloat commented Jun 25, 2022

Uh oh!

staticfloat commented Jun 25, 2022

Uh oh!

vtjnash commented Jun 26, 2022

Uh oh!

staticfloat commented Jun 27, 2022

Uh oh!

vtjnash commented Jun 27, 2022

Uh oh!

staticfloat commented Jun 29, 2022

Uh oh!

staticfloat commented Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

staticfloat commented Jun 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

staticfloat commented Jun 29, 2022 •

edited

Loading