Skip to content
3 changes: 2 additions & 1 deletion Lib/encodings/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ def normalize_encoding(encoding):
if c.isalnum() or c == '.':
if punct and chars:
chars.append('_')
chars.append(c)
if c.isascii():
chars.append(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to ask you to add a ".. versionchanged:: 3.10" entry in the documentation, but then I noticed that the encodings module was never documented! Oh!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If end user will use this function or module, I can try to create the doc, but I need some time to do it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can and must be addressed in a separated PR anymore. The lack of documentation should not hold this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, copy that.

punct = False
else:
punct = True
Expand Down
19 changes: 18 additions & 1 deletion Lib/test/test_codecs.py
Original file line number Diff line number Diff line change
Expand Up @@ -3417,7 +3417,7 @@ def test_rot13_func(self):

class CodecNameNormalizationTest(unittest.TestCase):
"""Test codec name normalization"""
def test_normalized_encoding(self):
def test_codecs_lookup(self):
FOUND = (1, 2, 3, 4)
NOT_FOUND = (None, None, None, None)
def search_function(encoding):
Expand All @@ -3439,6 +3439,23 @@ def search_function(encoding):
self.assertEqual(NOT_FOUND, codecs.lookup('BBB.8'))
self.assertEqual(NOT_FOUND, codecs.lookup('a\xe9\u20ac-8'))

def test_encodings_normalize_encoding(self):
# encodings.normalize_encoding() ignores non-ASCII letters.
out = encodings.normalize_encoding('utf_8')
self.assertEqual(out, 'utf_8')
out = encodings.normalize_encoding('utf\xE9\u20AC\U0010ffff-8')
self.assertEqual(out, 'utf_8')
out = encodings.normalize_encoding('utf 8')
self.assertEqual(out, 'utf_8')
# encodings.normalize_encoding() doesn't convert
# characters to lower case.
out = encodings.normalize_encoding('UTF 8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to add a comment to explain that the function does not convert upper case letters to lower case letters, just to make the purpose of this test even more explicit?

Copy link
Member Author

@shihai1991 shihai1991 Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't know how to exact explain it~
I found a case in https://github.com/python/cpython/blob/master/Lib/locale.py#L358.
Looks like It's fine to update encodings.normalize_encoding() to conver to lower-case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just describe the fact, Lol~

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can enhance this encodings.normalize_encoding()? I am not sure~

self.assertEqual(out, 'UTF_8')
out = encodings.normalize_encoding('utf.8')
self.assertEqual(out, 'utf.8')
out = encodings.normalize_encoding('utf...8')
self.assertEqual(out, 'utf...8')


if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
:func:`encodings.normalize_encoding` now ignores non-ASCII letters.