writing correct Tar UTF8 filenames #182

shu66y · 2017-07-04T10:19:49Z

We had the problem of writing UTF8 Filenames within a Tar correctly. I didn't consider reading so far.

I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.

piksel · 2018-07-01T17:56:02Z

I have been going through how the tar headers are parsed for #121, and UTF-8 encoded names is probably better. The specification suggests using extended headers for encodings other than ASCII, but GNU Tar writes the UTF8-encoded bytes as the Name so I guess we should too.

piksel · 2019-06-15T21:03:22Z

src/ICSharpCode.SharpZipLib/Tar/TarHeader.cs

 			int i;

-			for (i = 0; i < length && nameOffset + i < name.Length; ++i)
+			byte[] nameBytes = Encoding.UTF8.GetBytes(name);


I doubt we want BOMs in the name

Suggested change

byte[] nameBytes = Encoding.UTF8.GetBytes(name);

byte[] nameBytes = new Encoding.UTF8Encoding(false).GetBytes(name);

itn3000 · 2019-06-23T18:18:39Z

This PR's implementation may cut the filename which has long length(over 100 bytes in UTF-8) + non-ascii character because char above U+80 is not one byte in UTF-8.

Numpsy · 2020-06-20T17:06:06Z

Is this obsoleted by #364 being merged?

piksel · 2020-06-22T08:25:08Z

Indeed. Thanks.

writing correct Tar UTF8 filenames

f82ffe5

piksel added a commit to piksel/SharpZipLib that referenced this pull request Jul 1, 2018

Add repro for icsharpcode#121 (and icsharpcode#182)

ec138b9

Merge branch 'master' into master

62f4d00

piksel reviewed Jun 15, 2019

View reviewed changes

itn3000 mentioned this pull request Jun 25, 2019

add encoding parameter to creating tar entry #364

Merged

piksel mentioned this pull request Jul 29, 2019

I think we need the IsUnicodeText property in TarEntry #156

Closed

piksel closed this Jun 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

writing correct Tar UTF8 filenames #182

writing correct Tar UTF8 filenames #182

Uh oh!

shu66y commented Jul 4, 2017

Uh oh!

piksel commented Jul 1, 2018

Uh oh!

piksel Jun 15, 2019

Uh oh!

itn3000 commented Jun 23, 2019 •

edited

Loading

Uh oh!

Numpsy commented Jun 20, 2020

Uh oh!

piksel commented Jun 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	byte[] nameBytes = Encoding.UTF8.GetBytes(name);
	byte[] nameBytes = new Encoding.UTF8Encoding(false).GetBytes(name);

writing correct Tar UTF8 filenames #182

writing correct Tar UTF8 filenames #182

Uh oh!

Conversation

shu66y commented Jul 4, 2017

Uh oh!

piksel commented Jul 1, 2018

Uh oh!

piksel Jun 15, 2019

Choose a reason for hiding this comment

Uh oh!

itn3000 commented Jun 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Numpsy commented Jun 20, 2020

Uh oh!

piksel commented Jun 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

itn3000 commented Jun 23, 2019 •

edited

Loading