Skip to content

Conversation

@TalAloni
Copy link

This is to amend PR #255, note that before this patch the UnicodeText flag is set in the ZipEntry constructor, after we have already read the name as non-unicode, which is obviously not right.

I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.

…nicode is set

This is to amend PR icsharpcode#255, note that before this patch the UnicodeText flag is set in the ZipEntry constructor, after we have already read the name as non-unicode, which is obviously not right.
@TalAloni
Copy link
Author

UTF8_Filenames.zip
Attached a sample file containing some UTF8 filenames.

@piksel
Copy link
Member

piksel commented Nov 12, 2018

This will probably be corrected by #280. Can you confirm this?

@piksel piksel closed this Nov 12, 2018
@TalAloni
Copy link
Author

Hi, #280 DOES NOT solve the issue.
Note that the other PR does not solve the issue with the UnicodeText flag not being set correctly. my PR solves that.

@TalAloni
Copy link
Author

Please reopen.

@piksel
Copy link
Member

piksel commented Nov 23, 2018

What is the issue that you are trying to fix? The property is read from the headers, manually overriding them by setting the property here does nothing as far as I can tell.

@TalAloni
Copy link
Author

I have attached a sample file, when reading the filenames they are not read as Unicode. I have tested the fix and it does solve the issue.
note that before this patch the UnicodeText flag is never set when calling ConvertToStringExt

@piksel
Copy link
Member

piksel commented Nov 23, 2018

No, its not set because thats not what it is for. ZipStrings uses Unicode if the Zip file header has the Unicode flag set. If the header is not used when decoding the file names, then there is another bug somewhere, but the UseUnicode property is not for the library to use internally, it's meant to be use for manually overriding the Encoding or specifying it for creating new archives.

@TalAloni
Copy link
Author

I'm not sure what the right solution is, I have attached the sample to demonstrate that there is a problem somewhere. please reopen.

@piksel
Copy link
Member

piksel commented Nov 23, 2018

Could you provide the code you are using? You should start by making an issue. Ideally with a short code sample that reproduces the issue. The testing code does not currently experience your problem afaik. I closed this because it is a PR that does not seem to fix an issue and introduces bugs.

@TalAloni
Copy link
Author

Are you saying that UseUnicode is for encoding only, and that there is no comparable flag to force Unicode decoding?
From what I've gathered zip files created by OS X do not have this 'bit' set, so SharpZipLib will assume that those files are not unicode-encoded. (some more information here

@piksel
Copy link
Member

piksel commented Nov 23, 2018

That is what #280 adds. The ability to override the flag for decoding.

@TalAloni
Copy link
Author

I do I turn it on?

@piksel
Copy link
Member

piksel commented Nov 23, 2018

I will write some code samples when 1.1 is released. Ill try to get it released this weekend, it's long due. Unfortunatly I've been rather busy as of late.

@TalAloni
Copy link
Author

OK, I can confirm that when using #280 and setting ZipStrings.CodePage = 65001 the file names are decoded correctly.
Note that ZipStrings.UseUnicode does not affect decoding (as far as I can tell) so the name is a bit misleading.

Thanks for all your efforts! much appreciated!

@piksel
Copy link
Member

piksel commented Nov 23, 2018

UseUnicode should definitely change the encoding used for extracting. I probably overlooked it when merging 280. Ill take a look.

@TalAloni
Copy link
Author

I think you meant to write 'UseUnicode should definitely change the decoding'

@TalAloni
Copy link
Author

I have observed that when setting ZipStrings.CodePage = 65001 the filenames are decoded correctly but
on subsequent encoding they will not be encoded correctly, here is a sample that demonstrate the problem:

static void Test()
{
    string inputPath = @"D:\UTF8_Filenames.zip";
    string outputPath = @"D:\Output.zip";
    FileStream inputStream = new FileStream(inputPath, FileMode.Open, FileAccess.Read);

    MemoryStream stream = new MemoryStream();
    CopyStream(inputStream, stream);

    ZipFile zipFile = new ZipFile(stream);
    zipFile.BeginUpdate();
    zipFile.AddDirectory("Test");
    zipFile.CommitUpdate();

    stream.Position = 0;
    FileStream outputStream = new FileStream(outputPath, FileMode.Create, FileAccess.Write, FileShare.None);
    CopyStream(stream, outputStream);
    inputStream.Close();
}

public static long CopyStream(Stream input, Stream output)
{
    // input may not support seeking, so don't use input.Position
    return CopyStream(input, output, Int64.MaxValue);
 }

public static long CopyStream(Stream input, Stream output, long count)
{
    const int MaxBufferSize = 1048576; // 1 MB
    int bufferSize = (int)Math.Min(MaxBufferSize, count);
    byte[] buffer = new byte[bufferSize];
    long totalBytesRead = 0;
    while (totalBytesRead < count)
    {
        int numberOfBytesToRead = (int)Math.Min(bufferSize, count - totalBytesRead);
        int bytesRead = input.Read(buffer, 0, numberOfBytesToRead);
        totalBytesRead += bytesRead;
        output.Write(buffer, 0, bytesRead);
        if (bytesRead == 0) // no more bytes to read from input stream
        {
            return totalBytesRead;
        }
    }
    return totalBytesRead;
}

@piksel
Copy link
Member

piksel commented Jan 17, 2019

In your example you are not setting the codepage?

@TalAloni
Copy link
Author

65001 is the default so it doesn't matter if I set it or not. (I checked)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants