Skip to content
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
08f2a8b
ref: Add public APIs.
carlossanlop Sep 22, 2021
973fc8c
src: Expose the archive and entry comments.
carlossanlop Sep 22, 2021
be54e4a
tests: Add update tests for archives and for entries. They cover crea…
carlossanlop Sep 22, 2021
28467ae
Fix encoding detection feedback
carlossanlop Sep 28, 2021
297b9e5
Fix encoding detection feedback
carlossanlop Sep 28, 2021
338b184
Address suggestions
carlossanlop Oct 6, 2021
31a2227
Switch names of archive comment fields.
carlossanlop Oct 18, 2021
78d7e37
Address unicode bit flag sharing problem.
carlossanlop Oct 19, 2021
0a29149
Add more test cases
carlossanlop Oct 19, 2021
b93305b
Adjust tests
carlossanlop Oct 19, 2021
7496628
Add newline so comment only applies to one line
carlossanlop Oct 19, 2021
a949883
Ensure string byte truncation is aligned to encoding's char size.
carlossanlop Nov 27, 2021
9738c30
Remove empty check for non-nullable string. Also remove unnecessary D…
carlossanlop Nov 29, 2021
5179ea2
Defer calculation of truncated encoding string to getter and to writ…
carlossanlop Dec 4, 2021
0e47de4
Rename test arguments
carlossanlop Dec 6, 2021
18bbcd4
Only use bytes[]
carlossanlop Dec 7, 2021
31602b3
Remove unnecessary bit comment
carlossanlop Dec 7, 2021
0ae8a64
Remove unnecessary length check
carlossanlop Dec 7, 2021
e9d5b67
Address feedback
carlossanlop Dec 8, 2021
77b6ee5
Suggestion by adamsitnik: write only if length > 0
carlossanlop Dec 7, 2021
8daedcf
Simplify EntryName code
carlossanlop Dec 8, 2021
1596bb3
Move entryName code to original location
carlossanlop Dec 8, 2021
3444631
In UTF8, use Runes to detect code point length to prevent truncating …
carlossanlop Jan 19, 2022
0749c84
Address suggestions
carlossanlop Feb 3, 2022
b08f0ac
Move ZipTestHelper back to its original position because S.IO.Compres…
carlossanlop Feb 3, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions src/libraries/Common/tests/System/IO/Compression/ZipTestHelper.cs
Original file line number Diff line number Diff line change
Expand Up @@ -383,5 +383,57 @@ internal static void AddEntry(ZipArchive archive, string name, string contents,
w.WriteLine(contents);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be in tests folder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I don't know why it was created there. Maybe we used to share something with another assembly, but it got removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah search didn't return the other file that consumes ZipTestHelper. It's the ZipFile.csproj. I'll revert the move of this file.

}
}

protected const string SmileyEmoji = "\ud83d\ude04";
protected const string LowerCaseOUmlautChar = "\u00F6";
protected const string CopyrightChar = "\u00A9";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's special about these versus '1' or 'a'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed chars that cannot be represented in ASCII.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you express that in the name of the const or in a comment?


// Returns pairs that are returned the same way by Utf8 and Latin1
// Returns: originalComment, expectedComment
private static IEnumerable<object[]> SharedComment_Data()
{
yield return new object[] { null, string.Empty };
yield return new object[] { string.Empty, string.Empty };
yield return new object[] { "a", "a" };
yield return new object[] { LowerCaseOUmlautChar, LowerCaseOUmlautChar };
}

// Returns pairs as expected by Utf8
// Returns: originalComment, expectedComment
public static IEnumerable<object[]> Utf8Comment_Data()
{
string asciiExpectedExactMaxLength = new('a', ushort.MaxValue);
string asciiOriginalOverMaxLength = asciiExpectedExactMaxLength + "aaa";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these prefixed as Ascii considering that the method name is Utf8Comment_Data?

Suggested change
string asciiExpectedExactMaxLength = new('a', ushort.MaxValue);
string asciiOriginalOverMaxLength = asciiExpectedExactMaxLength + "aaa";
string utf8ExpectedExactMaxLength = new('a', ushort.MaxValue);
string utf8OriginalOverMaxLength = asciiExpectedExactMaxLength + "aaa";

Copy link
Contributor Author

@carlossanlop carlossanlop Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's test data. This method return test cases for both ASCII and UTF8. Renaming these variables would be incorrect.


// A smiley emoji code point consists of two characters,
// meaning the whole emoji should be fully truncated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a case where the emoji does not get truncated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case where the string is truncated is when it's length is greater than expected. The contents don't matter. So if a string is smaller than the max length, it should remain untouched, regardless of its contents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

string utf8ExpectedJustALetters = new('a', ushort.MaxValue - 1);
string utf8OriginalALettersAndOneEmoji = utf8ExpectedJustALetters + SmileyEmoji;

foreach (object[] e in SharedComment_Data())
{
yield return e;
}
Comment on lines +428 to +431
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to the end of the method to avoid mixing what is being reused from what is unique to this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to yield the shared cases first.


yield return new object[] { asciiOriginalOverMaxLength, asciiExpectedExactMaxLength };
yield return new object[] { utf8OriginalALettersAndOneEmoji, utf8ExpectedJustALetters };
}

// Returns pairs as expected by Latin1
// Returns: originalComment, expectedComment
public static IEnumerable<object[]> Latin1Comment_Data()
{
// In Latin1, all characters are exactly 1 byte

string latin1ExpectedALettersAndOneOUmlaut = new string('a', ushort.MaxValue - 1) + LowerCaseOUmlautChar;
string latin1OriginalALettersAndTwoOUmlauts = latin1ExpectedALettersAndOneOUmlaut + LowerCaseOUmlautChar;

foreach (object[] e in SharedComment_Data())
{
yield return e;
}

yield return new object[] { latin1OriginalALettersAndTwoOUmlauts, latin1ExpectedALettersAndOneOUmlaut };
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,8 @@ public ZipArchive(System.IO.Stream stream) { }
public ZipArchive(System.IO.Stream stream, System.IO.Compression.ZipArchiveMode mode) { }
public ZipArchive(System.IO.Stream stream, System.IO.Compression.ZipArchiveMode mode, bool leaveOpen) { }
public ZipArchive(System.IO.Stream stream, System.IO.Compression.ZipArchiveMode mode, bool leaveOpen, System.Text.Encoding? entryNameEncoding) { }
[System.Diagnostics.CodeAnalysis.AllowNull]
public string Comment { get { throw null; } set { } }
public System.Collections.ObjectModel.ReadOnlyCollection<System.IO.Compression.ZipArchiveEntry> Entries { get { throw null; } }
public System.IO.Compression.ZipArchiveMode Mode { get { throw null; } }
public System.IO.Compression.ZipArchiveEntry CreateEntry(string entryName) { throw null; }
Expand All @@ -106,6 +108,8 @@ public partial class ZipArchiveEntry
{
internal ZipArchiveEntry() { }
public System.IO.Compression.ZipArchive Archive { get { throw null; } }
[System.Diagnostics.CodeAnalysis.AllowNull]
public string Comment { get { throw null; } set { } }
public long CompressedLength { get { throw null; } }
[System.CLSCompliantAttribute(false)]
public uint Crc32 { get { throw null; } }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -218,8 +218,8 @@
<data name="EntriesInCreateMode" xml:space="preserve">
<value>Cannot access entries in Create mode.</value>
</data>
<data name="EntryNameEncodingNotSupported" xml:space="preserve">
<value>The specified entry name encoding is not supported.</value>
<data name="EntryNameAndCommentEncodingNotSupported" xml:space="preserve">
<value>The specified encoding is not supported for entry names and comments.</value>
</data>
<data name="EntryNamesTooLong" xml:space="preserve">
<value>Entry names cannot require more than 2^16 bits.</value>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Diagnostics;
using System.Diagnostics.CodeAnalysis;
using System.Text;

namespace System.IO.Compression
Expand All @@ -27,8 +28,8 @@ public class ZipArchive : IDisposable
private uint _numberOfThisDisk; //only valid after ReadCentralDirectory
private long _expectedNumberOfEntries;
private Stream? _backingStream;
private byte[]? _archiveComment;
private Encoding? _entryNameEncoding;
private byte[] _archiveComment;
private Encoding? _entryNameAndCommentEncoding;

#if DEBUG_FORCE_ZIP64
public bool _forceZip64;
Expand Down Expand Up @@ -121,7 +122,7 @@ public ZipArchive(Stream stream, ZipArchiveMode mode, bool leaveOpen, Encoding?
if (stream == null)
throw new ArgumentNullException(nameof(stream));

EntryNameEncoding = entryNameEncoding;
EntryNameAndCommentEncoding = entryNameEncoding;
Stream? extraTempStream = null;

try
Expand Down Expand Up @@ -173,7 +174,7 @@ public ZipArchive(Stream stream, ZipArchiveMode mode, bool leaveOpen, Encoding?
_centralDirectoryStart = 0; // invalid until ReadCentralDirectory
_isDisposed = false;
_numberOfThisDisk = 0; // invalid until ReadCentralDirectory
_archiveComment = null;
_archiveComment = Array.Empty<byte>();

switch (mode)
{
Expand Down Expand Up @@ -211,6 +212,20 @@ public ZipArchive(Stream stream, ZipArchiveMode mode, bool leaveOpen, Encoding?
}
}

/// <summary>
/// Gets or sets the optional archive comment.
/// </summary>
/// <remarks>
/// The comment encoding is determined by the <c>entryNameEncoding</c> parameter of the <see cref="ZipArchive(Stream,ZipArchiveMode,bool,Encoding?)"/> constructor.
/// If the comment byte length is larger than <see cref="ushort.MaxValue"/>, it will be truncated when disposing the archive.
/// </remarks>
[AllowNull]
public string Comment
{
get => (EntryNameAndCommentEncoding ?? Encoding.UTF8).GetString(_archiveComment);
set => _archiveComment = ZipHelper.GetEncodedTruncatedBytesFromString(value, EntryNameAndCommentEncoding, ZipEndOfCentralDirectoryBlock.ZipFileCommentMaxLength, out _);
}

/// <summary>
/// The collection of entries that are currently in the ZipArchive. This may not accurately represent the actual entries that are present in the underlying file or stream.
/// </summary>
Expand Down Expand Up @@ -345,9 +360,9 @@ public void Dispose()

internal uint NumberOfThisDisk => _numberOfThisDisk;

internal Encoding? EntryNameEncoding
internal Encoding? EntryNameAndCommentEncoding
{
get { return _entryNameEncoding; }
get => _entryNameAndCommentEncoding;

private set
{
Expand All @@ -370,10 +385,10 @@ private set
(value.Equals(Encoding.BigEndianUnicode)
|| value.Equals(Encoding.Unicode)))
{
throw new ArgumentException(SR.EntryNameEncodingNotSupported, nameof(EntryNameEncoding));
throw new ArgumentException(SR.EntryNameAndCommentEncodingNotSupported, nameof(EntryNameAndCommentEncoding));
}

_entryNameEncoding = value;
_entryNameAndCommentEncoding = value;
}
}

Expand Down Expand Up @@ -547,9 +562,7 @@ private void ReadEndOfCentralDirectory()

_expectedNumberOfEntries = eocd.NumberOfEntriesInTheCentralDirectory;

// only bother saving the comment if we are in update mode
if (_mode == ZipArchiveMode.Update)
_archiveComment = eocd.ArchiveComment;
_archiveComment = eocd.ArchiveComment;

TryReadZip64EndOfCentralDirectory(eocd, eocdStart);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,11 @@ public partial class ZipArchiveEntry
// only apply to update mode
private List<ZipGenericExtraField>? _cdUnknownExtraFields;
private List<ZipGenericExtraField>? _lhUnknownExtraFields;
private readonly byte[]? _fileComment;
private byte[] _fileComment;
private readonly CompressionLevel? _compressionLevel;
private bool _hasUnicodeEntryNameOrComment;

// Initializes, attaches it to archive
// Initializes a ZipArchiveEntry instance for an existing archive entry.
internal ZipArchiveEntry(ZipArchive archive, ZipCentralDirectoryFileHeader cd)
{
_archive = archive;
Expand Down Expand Up @@ -72,17 +73,22 @@ internal ZipArchiveEntry(ZipArchive archive, ZipCentralDirectoryFileHeader cd)
_everOpenedForWrite = false;
_outstandingWriteStream = null;

FullName = DecodeEntryName(cd.Filename);
_storedEntryNameBytes = cd.Filename;
_storedEntryName = (_archive.EntryNameAndCommentEncoding ?? Encoding.UTF8).GetString(_storedEntryNameBytes);
DetectEntryNameVersion();

_lhUnknownExtraFields = null;
// the cd should have these as null if we aren't in Update mode
// the cd should have this as null if we aren't in Update mode
_cdUnknownExtraFields = cd.ExtraFields;

_fileComment = cd.FileComment;

_compressionLevel = null;

_hasUnicodeEntryNameOrComment = (_generalPurposeBitFlag & BitFlagValues.UnicodeFileNameAndComment) != 0;
}

// Initializes new entry
// Initializes a ZipArchiveEntry instance for a new archive entry with a specified compression level.
internal ZipArchiveEntry(ZipArchive archive, string entryName, CompressionLevel compressionLevel)
: this(archive, entryName)
{
Expand All @@ -93,7 +99,7 @@ internal ZipArchiveEntry(ZipArchive archive, string entryName, CompressionLevel
}
}

// Initializes new entry
// Initializes a ZipArchiveEntry instance for a new archive entry.
internal ZipArchiveEntry(ZipArchive archive, string entryName)
{
_archive = archive;
Expand Down Expand Up @@ -125,7 +131,8 @@ internal ZipArchiveEntry(ZipArchive archive, string entryName)

_cdUnknownExtraFields = null;
_lhUnknownExtraFields = null;
_fileComment = null;

_fileComment = Array.Empty<byte>();

_compressionLevel = null;

Expand All @@ -137,6 +144,8 @@ internal ZipArchiveEntry(ZipArchive archive, string entryName)
{
_archive.AcquireArchiveStream(this);
}

_hasUnicodeEntryNameOrComment = false;
}

/// <summary>
Expand Down Expand Up @@ -174,6 +183,24 @@ public int ExternalAttributes
}
}

/// <summary>
/// Gets or sets the optional entry comment.
/// </summary>
/// <remarks>
///The comment encoding is determined by the <c>entryNameEncoding</c> parameter of the <see cref="ZipArchive(Stream,ZipArchiveMode,bool,Encoding?)"/> constructor.
/// If the comment byte length is larger than <see cref="ushort.MaxValue"/>, it will be truncated when disposing the archive.
/// </remarks>
[AllowNull]
public string Comment
{
get => (_archive.EntryNameAndCommentEncoding ?? Encoding.UTF8).GetString(_fileComment);
set
{
_fileComment = ZipHelper.GetEncodedTruncatedBytesFromString(value, _archive.EntryNameAndCommentEncoding, ushort.MaxValue, out bool isUTF8);
_hasUnicodeEntryNameOrComment |= isUTF8;
}
}

/// <summary>
/// The relative path of the entry as stored in the Zip archive. Note that Zip archives allow any string to be the path of the entry, including invalid and absolute paths.
/// </summary>
Expand All @@ -191,17 +218,13 @@ private set
if (value == null)
throw new ArgumentNullException(nameof(FullName));

bool isUTF8;
_storedEntryNameBytes = EncodeEntryName(value, out isUTF8);
_storedEntryName = value;
_storedEntryNameBytes = ZipHelper.GetEncodedTruncatedBytesFromString(
value, _archive.EntryNameAndCommentEncoding, 0 /* No truncation */, out bool hasUnicodeEntryName);

if (isUTF8)
_generalPurposeBitFlag |= BitFlagValues.UnicodeFileName;
else
_generalPurposeBitFlag &= ~BitFlagValues.UnicodeFileName;
_hasUnicodeEntryNameOrComment |= hasUnicodeEntryName;
_storedEntryName = value;

if (ParseFileName(value, _versionMadeByPlatform) == "")
VersionToExtractAtLeast(ZipVersionNeededValues.ExplicitDirectory);
DetectEntryNameVersion();
}
}

Expand Down Expand Up @@ -396,39 +419,6 @@ private CompressionMethodValues CompressionMethod
}
}

private string DecodeEntryName(byte[] entryNameBytes)
{
Debug.Assert(entryNameBytes != null);

Encoding readEntryNameEncoding;
if ((_generalPurposeBitFlag & BitFlagValues.UnicodeFileName) == 0)
{
readEntryNameEncoding = _archive == null ?
Encoding.UTF8 :
_archive.EntryNameEncoding ?? Encoding.UTF8;
}
else
{
readEntryNameEncoding = Encoding.UTF8;
}

return readEntryNameEncoding.GetString(entryNameBytes);
}

private byte[] EncodeEntryName(string entryName, out bool isUTF8)
{
Debug.Assert(entryName != null);

Encoding writeEntryNameEncoding;
if (_archive != null && _archive.EntryNameEncoding != null)
writeEntryNameEncoding = _archive.EntryNameEncoding;
else
writeEntryNameEncoding = ZipHelper.RequiresUnicode(entryName) ? Encoding.UTF8 : Encoding.ASCII;

isUTF8 = writeEntryNameEncoding.Equals(Encoding.UTF8);
return writeEntryNameEncoding.GetBytes(entryName);
}

// does almost everything you need to do to forget about this entry
// writes the local header/data, gets rid of all the data,
// closes all of the streams except for the very outermost one that
Expand Down Expand Up @@ -516,6 +506,11 @@ internal void WriteCentralDirectoryFileHeader()
extraFieldLength = (ushort)bigExtraFieldLength;
}

if (_hasUnicodeEntryNameOrComment)
_generalPurposeBitFlag |= BitFlagValues.UnicodeFileNameAndComment;
else
_generalPurposeBitFlag &= ~BitFlagValues.UnicodeFileNameAndComment;

writer.Write(ZipCentralDirectoryFileHeader.SignatureConstant); // Central directory file header signature (4 bytes)
writer.Write((byte)_versionMadeBySpecification); // Version made by Specification (version) (1 byte)
writer.Write((byte)CurrentZipPlatform); // Version made by Compatibility (type) (1 byte)
Expand All @@ -529,10 +524,9 @@ internal void WriteCentralDirectoryFileHeader()
writer.Write((ushort)_storedEntryNameBytes.Length); // File Name Length (2 bytes)
writer.Write(extraFieldLength); // Extra Field Length (2 bytes)

// This should hold because of how we read it originally in ZipCentralDirectoryFileHeader:
Debug.Assert((_fileComment == null) || (_fileComment.Length <= ushort.MaxValue));
Debug.Assert(_fileComment.Length <= ushort.MaxValue);

writer.Write(_fileComment != null ? (ushort)_fileComment.Length : (ushort)0); // file comment length
writer.Write((ushort)_fileComment.Length);
writer.Write((ushort)0); // disk number start
writer.Write((ushort)0); // internal file attributes
writer.Write(_externalFileAttr); // external file attributes
Expand All @@ -546,7 +540,7 @@ internal void WriteCentralDirectoryFileHeader()
if (_cdUnknownExtraFields != null)
ZipGenericExtraField.WriteAllBlocks(_cdUnknownExtraFields, _archive.ArchiveStream);

if (_fileComment != null)
if (_fileComment.Length > 0)
writer.Write(_fileComment);
}

Expand Down Expand Up @@ -596,6 +590,14 @@ internal void ThrowIfNotOpenable(bool needToUncompress, bool needToLoadIntoMemor
throw new InvalidDataException(message);
}

private void DetectEntryNameVersion()
{
if (ParseFileName(_storedEntryName, _versionMadeByPlatform) == "")
{
VersionToExtractAtLeast(ZipVersionNeededValues.ExplicitDirectory);
}
}

private CheckSumAndSizeWriteStream GetDataCompressor(Stream backingStream, bool leaveBackingStreamOpen, EventHandler? onClose)
{
// stream stack: backingStream -> DeflateStream -> CheckSumWriteStream
Expand Down Expand Up @@ -1303,7 +1305,7 @@ protected override void Dispose(bool disposing)
}

[Flags]
internal enum BitFlagValues : ushort { DataDescriptor = 0x8, UnicodeFileName = 0x800 }
internal enum BitFlagValues : ushort { DataDescriptor = 0x8, UnicodeFileNameAndComment = 0x800 }

internal enum CompressionMethodValues : ushort { Stored = 0x0, Deflate = 0x8, Deflate64 = 0x9, BZip2 = 0xC, LZMA = 0xE }
}
Expand Down
Loading