Skip to content

Fix ZIP parsing failure on non-seekable streams with short reads#1169

Merged
adamhathcock merged 5 commits intoreleasefrom
copilot/fix-zip-parsing-regression
Jan 27, 2026
Merged

Fix ZIP parsing failure on non-seekable streams with short reads#1169
adamhathcock merged 5 commits intoreleasefrom
copilot/fix-zip-parsing-regression

Conversation

Copy link
Contributor

Copilot AI commented Jan 27, 2026

Fix ZIP parsing regression with short reads on non-seekable streams

Completed:

  • Understand the issue: BinaryReader doesn't handle short reads properly on non-seekable streams
  • Create a test that reproduces the issue with different chunking patterns
  • Fix the issue by ensuring SharpCompressStream handles short reads correctly
  • Implement buffer fill methods following the ReadFully pattern
  • Update all Read methods (sync and async variants)
  • Verify all ZIP tests pass (424 tests passed)
  • Verify all Reader tests pass (205 tests passed)
  • Apply code formatting with CSharpier
  • Run code review (2 minor comments about parameter validation in private methods - not critical)
  • Run security checks (0 alerts found)
  • Refactor to use ReadFully pattern as requested

Fix Details:

The regression occurred because SharpCompressStream.Read() with buffering enabled would return short reads when the underlying stream returned less data than requested. This caused BinaryReader to receive incomplete data when parsing ZIP headers, leading to ZlibException failures depending on the chunking pattern.

Solution: Modified SharpCompressStream to implement buffer fill methods that follow the ReadFully/ReadFullyAsync pattern from Utility.cs. These methods read from the underlying stream in a loop until the internal buffer is filled or EOF is reached, but return the actual count read (rather than a boolean) to handle EOF gracefully.

Changes made to SharpCompressStream.cs:

  • Modified FillBuffer() to follow the ReadFully pattern (same logic as Utility.ReadFully but returns count)
  • Modified FillBufferAsync() to follow the ReadFullyAsync pattern
  • Modified FillBufferMemoryAsync() to follow the ReadFullyAsync pattern
  • All methods use the same loop-until-full-or-EOF approach as the existing utility methods
  • Updated comments to reference the ReadFully pattern

Test coverage:

  • Created ZipShortReadTests with 9 test cases covering various chunking patterns
  • Includes extreme case of 1 byte at a time which previously failed
  • All 424 ZIP tests pass

Summary

This fix resolves the reported issue where SharpCompress ZIP parsing became sensitive to Stream.Read chunking patterns on non-seekable streams. The solution follows the established ReadFully pattern used elsewhere in the codebase.

Original prompt

This section details on the original issue you should resolve

<issue_title>Regression: ZIP parsing fails depending on Stream.Read chunking on non-seekable streams</issue_title>
<issue_description>Hi,

I’m hitting what looks like a stream chunking sensitivity regression when reading ZIP archives from a non-seekable stream. With the exact same ZIP bytes, SharpCompress will sometimes fail depending only on how the underlying Stream.Read() splits the data.


Regression note

The exact same code path works correctly with SharpCompress 0.40.0.
The failure (ZlibException in the chunked case) starts occurring in newer versions.


Context / real-world scenario

This happens in a real ASP.NET Core streaming pipeline (multipart/form-data):

  • Source stream: HttpRequest.Body
  • Read via MultipartReader (multipart/form-data)
  • Archive entries are processed sequentially using ReaderFactory.Open(...).MoveToNextEntry()
  • Entry streams are non-seekable by design

A seemingly unrelated change (for example changing a text field value from "my-value" to "my-valu") shifts the alignment of the ZIP part by 1 byte, which changes the short-read pattern seen by SharpCompress and triggers a failure.

To make this report independent of ASP.NET / multipart, the repro below uses a custom non-seekable stream that returns legal short reads.


Reproduction

This snippet (with stream.zip) reads the same ZIP bytes three ways:

  1. Baseline: MemoryStream
  2. Non-seekable stream with short reads (first = 3816)
  3. Non-seekable stream with short reads (first = 3815)

Only case (3) fails with a ZlibException. Wrapping the stream with a simple coalescing wrapper fixes the issue.

var bytes = System.IO.File.ReadAllBytes("stream.zip");

Console.WriteLine($"SharpCompress: {typeof(ReaderFactory).Assembly.FullName}");
Console.WriteLine($"Input bytes: {bytes.Length}");

//
// Case 1 - Baseline
// Read the ZIP from a file-like stream (MemoryStream).
// This always works and serves as the reference behavior.
//
Console.WriteLine("\n== Baseline (MemoryStream) ==");
Dump(ReadEntries(new MemoryStream(bytes, writable: false)));

//
// Case 2 - Chunked non-seekable stream (first read = 3816 bytes)
// This simulates a network/multipart stream with legal short reads.
// This case still works.
//
Console.WriteLine("\n== Chunked (first=3816, then=4096) ==");
Dump(ReadEntries(new PatternReadStream(bytes, first: 3816, chunk: 4096)));

try
{
	//
	// Case 3 - Chunked non-seekable stream (first read = 3815 bytes)
	// Exact same input bytes, only the first Read() returns 1 byte less.
	//
	// This case works correctly on SharpCompress 0.40.0,
	// but throws a ZlibException on newer versions.
	//
	Console.WriteLine("\n== Chunked (first=3815, then=4096) ==");
	Dump(ReadEntries(new PatternReadStream(bytes, first: 3815, chunk: 4096)));
}
catch (ZlibException)
{
	//
	// Case 4 - Workaround
	// Wrap the same failing stream with a coalescing wrapper that
	// fills short reads. This makes SharpCompress behave correctly again.
	//
	Console.WriteLine("\n== Workaround: FillReadStream over chunked(3815/4096) ==");
	using (var s = new PatternReadStream(bytes, first: 3815, chunk: 4096))
	using (var fill = new FillReadStream(s))
	{
		Dump(ReadEntries(fill));
	}
}
static List<string> ReadEntries(Stream s)
{
	var names = new List<string>();
	using var reader = ReaderFactory.Open(s, new ReaderOptions { LeaveStreamOpen = true });

	while (reader.MoveToNextEntry())
	{
		if (reader.Entry.IsDirectory)
		{
			continue;
		}

		names.Add(reader.Entry.Key);

		using var es = reader.OpenEntryStream();
		es.CopyTo(Stream.Null);
	}

	return names;
}

static void Dump(List<string> names)
{
	Console.WriteLine($"Count={names.Count}");

	foreach (var n in names.Take(10))
	{
		Console.WriteLine(" - " + n);
	}

	if (names.Count > 10)
	{
		Console.WriteLine(" - ...");
	}
}

sealed class PatternReadStream : Stream
{
    private readonly MemoryStream inner;
    private readonly int first;
    private readonly int chunk;
    private bool firstDone;

    public PatternReadStream(byte[] bytes, int first, int chunk)
    {
        inner = new MemoryStream(bytes, writable: false);
        this.first = first;
        this.chunk = chunk;
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        int limit = !firstDone ? first : chunk;
        firstDone = true;

        int toRead = Math.Min(count, limit);
        return inner.Read(buffer, offset, toRead);
    }

    public override bool CanRead => true;

    public override bool CanSeek => false;

    public override bool CanWrite => false;

    public override long Length => throw new NotSupportedException();

    public override long Position { get => throw new NotSupportedException(); set => throw new NotSupportedException(); }

    public override void Flush() => throw new NotSupportedEx...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes adamhathcock/sharpcompress#1168

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.

Copilot AI and others added 2 commits January 27, 2026 16:00
Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix ZIP parsing failure with non-seekable streams Fix ZIP parsing failure on non-seekable streams with short reads Jan 27, 2026
Copilot AI requested a review from adamhathcock January 27, 2026 16:08
@adamhathcock
Copy link
Owner

@copilot can you use ReadFully or ReadExact?

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 27, 2026

@copilot can you use ReadFully or ReadExact?

Done! Refactored the FillBuffer, FillBufferAsync, and FillBufferMemoryAsync methods to follow the same pattern as Utility.ReadFully/ReadFullyAsync. They now use the same loop-until-full-or-EOF logic, but return the actual count read (instead of a boolean) to handle EOF gracefully when filling the internal buffer. All 424 ZIP tests pass. See commit d59e4c2.

@adamhathcock adamhathcock marked this pull request as ready for review January 27, 2026 16:30
@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Jan 27, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (2 files)
  • src/SharpCompress/IO/SharpCompressStream.cs - 0 issues
  • tests/SharpCompress.Test/Zip/ZipShortReadTests.cs - 0 issues

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a regression where ZIP archive parsing fails on non-seekable streams when the underlying Stream.Read() returns short reads (fewer bytes than requested). The issue manifested in real-world scenarios like ASP.NET Core multipart/form-data streaming where chunking patterns could vary.

Changes:

  • Modified SharpCompressStream to implement buffer fill methods that handle short reads by following the ReadFully pattern
  • Added comprehensive test coverage with ZipShortReadTests to prevent future regressions
  • Updated package lock file (unrelated maintenance update for existing Mono.Posix.NETStandard dependency)

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/SharpCompress/IO/SharpCompressStream.cs Added three private FillBuffer* methods that implement the ReadFully pattern to handle short reads from underlying streams. Updated all Read methods (sync and async variants) to use these buffer fill methods.
tests/SharpCompress.Test/Zip/ZipShortReadTests.cs New test class with 9 test cases covering various chunking patterns including extreme cases (1 byte at a time). Includes a PatternReadStream helper class to simulate non-seekable streams with controlled short reads.
tests/SharpCompress.Test/packages.lock.json Lock file update reflecting existing Mono.Posix.NETStandard dependency (used for Linux symlink tests, not related to this PR's functionality).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants