Skip to content

Document ZipReader DirectoryEntry behavior and add verification test#1054

Merged
adamhathcock merged 6 commits intomasterfrom
copilot/fix-zipreader-directory-entry
Nov 29, 2025
Merged

Document ZipReader DirectoryEntry behavior and add verification test#1054
adamhathcock merged 6 commits intomasterfrom
copilot/fix-zipreader-directory-entry

Conversation

Copy link
Contributor

Copilot AI commented Nov 29, 2025

  • Investigate the ZipReader vs ZipArchive behavior for DirectoryEntry headers
  • Run tests to verify current behavior
  • Analyze whether ignoring DirectoryEntry is intentional
  • Determine if any fix is needed based on analysis
  • Add tests to document expected behavior
  • Add documentation to FORMATS.md about multi-volume ZIP requirement
  • Add documentation about ZipReader directory and central directory handling
  • Apply code review feedback
  • Run code review

Investigation Summary

The issue reports that ZipReader ignores DirectoryEntry headers completely. After thorough investigation:

Current Behavior Analysis:

  1. ZipReader (streaming): Processes LocalEntry headers with file data, skips DirectoryEntry headers from central directory
  2. ZipArchive (seekable): Uses DirectoryEntry headers from central directory and seeks to local entries for data

Key Findings:

  • For all tested single-volume ZIP files, both ZipReader and ZipArchive return identical entries
  • Directories are correctly identified from LocalEntry headers (where name ends with /)
  • In streaming mode, DirectoryEntry headers are redundant because:
    • All entries have corresponding LocalEntry headers in well-formed ZIPs
    • File data follows LocalEntry headers immediately
    • We cannot seek back in streaming mode anyway

Multi-Volume ZIP Limitation:

  • For multi-volume ZIPs, ZipReader returns 0 entries because file data is in other volume files
  • This is a fundamental limitation of forward-only streaming - cannot seek to other files
  • Users should use ZipArchive for multi-volume ZIPs (requires seekable streams)

Conclusion:

The current behavior is intentional and correct for streaming mode. The maintainer's comment "I think I'm intentionally skipping them" is confirmed correct.

Changes Made

  1. Added explicit DirectoryEntry case in ZipReader.GetEntries() with documentation explaining why it's intentionally skipped
  2. Added test ZipReader_Returns_Same_Entries_As_ZipArchive verifying both APIs return identical entries for single-volume ZIPs
  3. Updated FORMATS.md with new "Zip Format Notes" section documenting:
    • Multi-volume/split ZIP archives require ZipArchive
    • ZipReader processes entries from LocalEntry headers (including directories)
    • ZipReader intentionally skips DirectoryEntry headers from central directory as they are redundant
Original prompt

This section details on the original issue you should resolve

<issue_title>ZipReader ignored DirectoryEntry completely</issue_title>
<issue_description>ZipReader -> ignored DirectoryEntry completely | Could not extract the zip using ZipReader

protected override IEnumerable<ZipEntry> GetEntries(Stream stream)
{
foreach (var h in _headerFactory.ReadStreamHeader(stream))
{
if (h != null)
{
switch (h.ZipHeaderType)
{
case ZipHeaderType.LocalEntry:
{
yield return new ZipEntry(
new StreamingZipFilePart((LocalEntryHeader)h, stream)
);
}
break;
case ZipHeaderType.DirectoryEnd:
{
yield break;
}
}
}
}
}

ZipArchive -> processed DirectoryEntry | Could extract the zip using ZipArchive

switch (h.ZipHeaderType)
{
case ZipHeaderType.DirectoryEntry:
{
var deh = (DirectoryEntryHeader)h;
Stream s;
if (
deh.RelativeOffsetOfEntryHeader + deh.CompressedSize
> vols[deh.DiskNumberStart].Stream.Length
)
{
var v = vols.Skip(deh.DiskNumberStart).ToArray();
s = new SourceStream(
v[0].Stream,
i => i < v.Length ? v[i].Stream : null,
new ReaderOptions() { LeaveStreamOpen = true }
);
}
else
{
s = vols[deh.DiskNumberStart].Stream;
}
yield return new ZipArchiveEntry(
this,
new SeekableZipFilePart(headerFactory.NotNull(), deh, s)
);
}
break;
case ZipHeaderType.DirectoryEnd:
{
var bytes = ((DirectoryEndHeader)h).Comment ?? Array.Empty<byte>();
vols.Last().Comment = ReaderOptions.ArchiveEncoding.Decode(bytes);
yield break;
}
}
</issue_description>

Comments on the Issue (you are @copilot in this section)

@adamhathcock I think I'm intentionally skipping them or used to. Will get copilot to check

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@adamhathcock
Copy link
Owner

@copilot continue

Copy link
Contributor Author

Copilot AI commented Nov 29, 2025

@copilot continue

Investigation complete. The current behavior of ignoring DirectoryEntry headers in ZipReader is intentional and correct:

  1. For single-volume ZIPs, both ZipReader and ZipArchive return identical entries - tested and verified
  2. DirectoryEntry headers reference already-processed LocalEntry headers, making them redundant in streaming mode
  3. For multi-volume ZIPs, ZipArchive is required since streaming cannot seek across files

Changes made (commit f4dddce):

  • Added documentation comments explaining why DirectoryEntry is skipped
  • Added test verifying ZipReader and ZipArchive return same entries for standard ZIPs

Copilot AI changed the title [WIP] Fix ZipReader to process DirectoryEntry correctly Document ZipReader DirectoryEntry behavior and add verification test Nov 29, 2025
Copy link
Owner

@adamhathcock adamhathcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the new comments to the FORMATS.md file

Copy link
Contributor Author

Copilot AI commented Nov 29, 2025

Also the new comments to the FORMATS.md file

Added documentation to FORMATS.md in commit a887390. Updated note #2 for Zip format to include: "Multi-volume/split ZIP archives require ZipArchive (seekable streams) as ZipReader cannot seek across volume files."

@adamhathcock adamhathcock marked this pull request as ready for review November 29, 2025 15:33
Copilot AI review requested due to automatic review settings November 29, 2025 15:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR documents the intentional behavior of ZipReader skipping DirectoryEntry headers from the central directory during streaming operations. The investigation confirmed this is correct behavior for forward-only streaming mode where DirectoryEntry headers are redundant since all file data is already available from LocalEntry headers.

  • Added inline documentation explaining why DirectoryEntry headers are intentionally skipped in streaming mode
  • Added verification test confirming ZipReader and ZipArchive return identical entries for single-volume ZIP files
  • Updated FORMATS.md to document multi-volume ZIP requirements and ZipReader's entry processing behavior

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/SharpCompress/Readers/Zip/ZipReader.cs Added documentation comments explaining why DirectoryEntry headers are intentionally skipped in streaming mode
tests/SharpCompress.Test/Zip/ZipReaderTests.cs Added test verifying ZipReader and ZipArchive return identical entries for single-volume ZIP files
FORMATS.md Added documentation about multi-volume ZIP archive requirements and ZipReader's LocalEntry/DirectoryEntry processing behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +78 to +83
// DirectoryEntry headers in the central directory are intentionally skipped.
// In streaming mode, we can only read forward, and DirectoryEntry headers
// reference LocalEntry headers that have already been processed. The file
// data comes from LocalEntry headers, not DirectoryEntry headers.
// For multi-volume ZIPs where file data spans multiple files, use ZipArchive
// instead, which requires seekable streams.
Copy link

Copilot AI Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment explaining DirectoryEntry behavior is placed immediately before the DirectoryEnd case, which is confusing. It should be placed after the break; on line 77 but before the switch statement ends, or as a comment within the switch explaining the missing case. Consider adding it as:

                    break;
                case ZipHeaderType.DirectoryEntry:
                    // DirectoryEntry headers in the central directory are intentionally skipped.
                    // In streaming mode, we can only read forward, and DirectoryEntry headers
                    // reference LocalEntry headers that have already been processed. The file
                    // data comes from LocalEntry headers, not DirectoryEntry headers.
                    // For multi-volume ZIPs where file data spans multiple files, use ZipArchive
                    // instead, which requires seekable streams.
                    break;
                case ZipHeaderType.DirectoryEnd:

This makes it clear the comment applies to the DirectoryEntry case, not the DirectoryEnd case.

Suggested change
// DirectoryEntry headers in the central directory are intentionally skipped.
// In streaming mode, we can only read forward, and DirectoryEntry headers
// reference LocalEntry headers that have already been processed. The file
// data comes from LocalEntry headers, not DirectoryEntry headers.
// For multi-volume ZIPs where file data spans multiple files, use ZipArchive
// instead, which requires seekable streams.
case ZipHeaderType.DirectoryEntry:
// DirectoryEntry headers in the central directory are intentionally skipped.
// In streaming mode, we can only read forward, and DirectoryEntry headers
// reference LocalEntry headers that have already been processed. The file
// data comes from LocalEntry headers, not DirectoryEntry headers.
// For multi-volume ZIPs where file data spans multiple files, use ZipArchive
// instead, which requires seekable streams.
break;

Copilot uses AI. Check for mistakes.
FORMATS.md Outdated

1. SOLID Rars are only supported in the RarReader API.
2. Zip format supports pkware and WinzipAES encryption. However, encrypted LZMA is not supported. Zip64 reading/writing is supported but only with seekable streams as the Zip spec doesn't support Zip64 data in post data descriptors. Deflate64 is only supported for reading.
2. Zip format supports pkware and WinzipAES encryption. However, encrypted LZMA is not supported. Zip64 reading/writing is supported but only with seekable streams as the Zip spec doesn't support Zip64 data in post data descriptors. Deflate64 is only supported for reading. Multi-volume/split ZIP archives require ZipArchive (seekable streams) as ZipReader cannot seek across volume files. ZipReader processes entries from LocalEntry headers (which include directory entries ending with `/`) and intentionally skips DirectoryEntry headers from the central directory, as they are redundant in streaming mode - all entry data comes from LocalEntry headers which ZipReader has already processed.
Copy link

Copilot AI Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is extremely long (over 500 characters), making it difficult to read and edit. Consider breaking it into multiple lines or moving the detailed explanation to a separate section with a reference from the table. For example:

2. Zip format supports pkware and WinzipAES encryption. However, encrypted LZMA is not supported. Zip64 reading/writing is supported but only with seekable streams as the Zip spec doesn't support Zip64 data in post data descriptors. Deflate64 is only supported for reading. See [Zip Format Notes](#zip-format-notes) for details on multi-volume archives and streaming behavior.

Then add a new section:

### Zip Format Notes
- Multi-volume/split ZIP archives require ZipArchive (seekable streams) as ZipReader cannot seek across volume files.
- ZipReader processes entries from LocalEntry headers (which include directory entries ending with `/`) and intentionally skips DirectoryEntry headers from the central directory, as they are redundant in streaming mode - all entry data comes from LocalEntry headers which ZipReader has already processed.

Copilot uses AI. Check for mistakes.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied the suggested changes in commit c5da416:

  • Moved the detailed Zip documentation to a new "Zip Format Notes" section with a reference from note confused nuget references #2
  • Added explicit DirectoryEntry case in ZipReader.cs with the documentation comment to clarify the intentional skip behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ZipReader ignored DirectoryEntry completely

3 participants