Skip to content

Conversation

@staticfloat
Copy link
Member

This is the recommended locale by Arch Linux, so let's look for it as
well

@staticfloat staticfloat added the test This change adds or pertains to unit tests label Jun 25, 2022
@staticfloat
Copy link
Member Author

This, combined with a rootfs image bump including the bits from JuliaCI/rootfs-images#208 will help us to silence CI warnings such as https://buildkite.com/julialang/julia-master/builds/13227#01819c05-81fc-4bd9-bb58-7f7872193d6c/560-843

@vtjnash
Copy link
Member

vtjnash commented Jun 26, 2022

This would sort of defeat the comment there which says not to use Unicode here

@staticfloat
Copy link
Member Author

I was confused by that comment, because the output of our tests seems to be identical across every korean locale I can find:

julia> korloc = ["ko_KR.UTF-8", "ko_KR.EUC-KR", "ko_KR.CP949", "ko_KR.949", "Korean_Korea.949"]
       withlocales(korloc) do locale
           s = Libc.strftime(0.0)
           @info(locale, s, bytes2hex(sha256(s)))
       end
┌ Info: ko_KR.UTF-8
│   s = "1970년 01월 01일 (목) 오전 12시 00분 00초"
└   bytes2hex(sha256(s)) = "d0117d51da3a9a0953e0b504207190950c3d664688743a103c95584148f99092"
┌ Info: ko_KR.EUC-KR
│   s = "1970년 01월 01일 (목) 오전 12시 00분 00초"
└   bytes2hex(sha256(s)) = "d0117d51da3a9a0953e0b504207190950c3d664688743a103c95584148f99092"
┌ Info: ko_KR.cp949
│   s = "1970년 01월 01일 (목) 오전 12시 00분 00초"
└   bytes2hex(sha256(s)) = "d0117d51da3a9a0953e0b504207190950c3d664688743a103c95584148f99092"

This was generated with a locale.gen that I made that looks like the following:

ko_KR.EUC-KR EUC-KR
ko_KR.UTF-8 UTF-8
ko_KR.cp949 CP949

If we're wanting to test codepoint output that is explicitly not UTF-8, I think we need a different test, because these all output the exact same UTF-8.

@vtjnash
Copy link
Member

vtjnash commented Jun 27, 2022

yes, exact equivalence would be rather the point of the test (#27273)

@staticfloat staticfloat changed the title Allow looking for ko_KR.UTF-8 locale as well Clarify non-Unicode Korean tests Jun 29, 2022
Add explanation and extra tests to ensure that our non-unicode
transcription works properly and outputs a reasonable UTF-8 string.

Note that on musl, `setlocales()` never fails, and so we cannot test
this properly.
@staticfloat
Copy link
Member Author

Alright, I've switched this PR to instead clarify some things and adds tests (where appropriate) to ensure that we don't just get en_US.UTF-8 back. I've also updated the Linux testers to use the new rootfs images that include ko_KR.EUC-KR baked in. We should see the linux testers now no longer print the warning that these tests are being skipped.

@staticfloat
Copy link
Member Author

staticfloat commented Jun 29, 2022

@staticfloat
Copy link
Member Author

@vtjnash Does this look better now? Thanks for helping me to understand the purpose behind this test. I've done my best to encode that in comments in the source as well.

@staticfloat staticfloat merged commit fec6951 into master Jun 30, 2022
@staticfloat staticfloat deleted the sf/ko_kr_utf8 branch June 30, 2022 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test This change adds or pertains to unit tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants