Skip to content

Conversation

@shu66y
Copy link

@shu66y shu66y commented Jul 4, 2017

We had the problem of writing UTF8 Filenames within a Tar correctly. I didn't consider reading so far.

I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.

@piksel
Copy link
Member

piksel commented Jul 1, 2018

I have been going through how the tar headers are parsed for #121, and UTF-8 encoded names is probably better. The specification suggests using extended headers for encodings other than ASCII, but GNU Tar writes the UTF8-encoded bytes as the Name so I guess we should too.

piksel added a commit to piksel/SharpZipLib that referenced this pull request Jul 1, 2018
int i;

for (i = 0; i < length && nameOffset + i < name.Length; ++i)
byte[] nameBytes = Encoding.UTF8.GetBytes(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt we want BOMs in the name

Suggested change
byte[] nameBytes = Encoding.UTF8.GetBytes(name);
byte[] nameBytes = new Encoding.UTF8Encoding(false).GetBytes(name);

@itn3000
Copy link
Contributor

itn3000 commented Jun 23, 2019

This PR's implementation may cut the filename which has long length(over 100 bytes in UTF-8) + non-ascii character because char above U+80 is not one byte in UTF-8.

@Numpsy
Copy link
Contributor

Numpsy commented Jun 20, 2020

Is this obsoleted by #364 being merged?

@piksel
Copy link
Member

piksel commented Jun 22, 2020

Indeed. Thanks.

@piksel piksel closed this Jun 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants