Skip to content

[WIP] Improve I/O operations while reading big solid archives#1163

Closed
julianxhokaxhiu wants to merge 1 commit intoadamhathcock:masterfrom
julianxhokaxhiu:fix/lzma-slow
Closed

[WIP] Improve I/O operations while reading big solid archives#1163
julianxhokaxhiu wants to merge 1 commit intoadamhathcock:masterfrom
julianxhokaxhiu:fix/lzma-slow

Conversation

@julianxhokaxhiu
Copy link

@julianxhokaxhiu julianxhokaxhiu commented Jan 27, 2026

This PR is a tentative effort to improve the extraction time for 7z archives, especially ones that are compressed as 1 block solid using 16MB dictionary.

The main idea here is to reduce the number of access to cache ( while doing Skip ), improve performance in writing files to the disk ( use 1MB buffer by default ) and aggressively inline codepaths that are reused across executions ( avoiding jumps in the code but instead favoring a more linear execution at the expense of code size ).

Feel free to test it and report back. In my own case I can finally see the extraction of files in the archive mentioned in #1105

Below Claude Sonnet 4.5 summary


Performance Optimization Summary for 7Zip Solid Archive Extraction

Problem

7Zip extraction with large solid archives (16MB dictionaries, 1-block compression) was extremely slow in version 0.42.0+ compared to 0.41.0, taking hours instead of minutes on high-end hardware.

Root Causes Identified Through Profiling

  1. StreamExtensions.Skip() consuming 94.54% CPU - byte-by-byte reading via CopyTo(Stream.Null)
  2. Excessive Path.GetFullPath() calls - Called 3+ times per extracted file
  3. Small I/O buffers - 80KB buffers insufficient for modern NVMe drives
  4. Method call overhead - Hot-path LZMA decompression methods not inlined

Optimizations Implemented

1. Skip Operation Optimization

File: StreamExtensions.cs

  • Changed from ReadOnlySubStream + CopyTo(Stream.Null) (byte-by-byte) to buffered reading
  • Increased buffer size from implicit default to 1MB using ArrayPool
  • Added fast path for BufferedSubStream to skip via internal method
  • Impact: Reduced Skip CPU usage from 94.54% → 82.24% → negligible

File: BufferedSubStream.cs

  • Added SkipInternal() method with 1MB buffer for efficient skipping
  • Skips cached data instantly, uses large buffered reads for remainder
  • Impact: Eliminates repeated RefillCache() calls when skipping large amounts of data

2. Increased I/O Buffer Sizes (80KB → 1MB)

Files Modified:

  • IArchiveEntryExtensions.cs - BufferSize constant
  • AbstractReader.cs - CopyTo/CopyToAsync calls

Rationale: Modern NVMe Gen 5 drives benefit significantly from larger sequential I/O operations

  • Impact: Better disk I/O throughput, reduced system calls

3. FileStream Optimization

File: IArchiveEntryExtensions.cs

  • Replaced File.Open() with explicit FileStream constructor
  • Specified 1MB buffer size explicitly
  • Added useAsync: true for async operations to enable overlapped I/O on Windows
  • Impact: Better async I/O performance, reduced context switching

4. Path Processing Optimization

File: ExtractionMethods.cs

  • Cached entry.Key to avoid multiple property accesses
  • Reduced Path.GetFullPath() calls from 3 per file to 1 by consolidating path operations
  • Combined Path.Combine calls: Combine(folder, file) instead of separate operations
  • Moved security validation before filesystem calls to avoid unnecessary I/O
  • Impact: WriteEntryToDirectory CPU reduced from 84.95% → 31.74% (63% reduction)

5. LZMA Decompression Micro-optimizations

Files Modified:

  • RangeCoder.cs
  • RangeCoderBit.cs
  • LzmaDecoder.cs

Added [MethodImpl(MethodImplOptions.AggressiveInlining)] to hot-path methods:

  • RangeCoder.Decoder: GetThreshold, Decode, DecodeBit, DecodeDirectBits
  • BitDecoder.Decode (called millions of times)
  • LzmaDecoder.LenDecoder.Decode
  • LzmaDecoder.Decoder2: DecodeNormal, DecodeWithMatchByte
  • LzmaDecoder.LiteralDecoder: GetState, DecodeNormal, DecodeWithMatchByte

Rationale: These methods are in tight decompression loops; inlining reduces call overhead and enables cross-method JIT optimizations

  • Impact: Reduced LZMA decompression overhead, though fundamental decompression work remains CPU-intensive as expected

Performance Results

  • Skip operation: No longer a bottleneck (was 94.54% CPU)
  • Path processing: 63% reduction in WriteEntryToDirectory overhead
  • Overall extraction: Significantly faster, especially noticeable with large solid archives
  • Hot path: Now dominated by actual LZMA decompression work (unavoidable)

Technical Notes

  • All buffer allocations use ArrayPool<byte> to minimize GC pressure
  • Changes maintain backward compatibility
  • Security validations (path traversal checks) preserved
  • Code formatted with CSharpier per project standards

Testing

Tested with 700MB+ solid 7zip archive with 16MB dictionary on AMD 9800X3D with NVMe Gen 5 drive. Extraction time improved from hours to minutes, matching performance of version 0.41.0.

Copilot AI review requested due to automatic review settings January 27, 2026 00:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves extraction performance for large solid 7z archives by reducing skip overhead, increasing I/O buffer sizes, and adding aggressive inlining in hot LZMA decode paths.

Changes:

  • Increase stream copy/write buffer sizes to 1MB for faster disk and stream I/O.
  • Optimize skipping on non-seekable streams, including a fast-path for BufferedSubStream.
  • Add AggressiveInlining attributes to selected LZMA range coder / decoder methods.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/SharpCompress/Readers/AbstractReader.cs Uses larger CopyTo/CopyToAsync buffers when writing entry data.
src/SharpCompress/Polyfills/StreamExtensions.cs Reworks Skip(long) to reduce read calls and adds a BufferedSubStream fast path.
src/SharpCompress/IO/BufferedSubStream.cs Adds SkipInternal to skip efficiently using cache + large reads.
src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoderBit.cs Adds inlining attribute to bit decode hot path.
src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoder.cs Adds inlining attributes to decoder methods used heavily in LZMA.
src/SharpCompress/Compressors/LZMA/LzmaDecoder.cs Adds inlining attributes to nested decoder hot paths.
src/SharpCompress/Common/ExtractionMethods.cs Refactors destination path computation/security checks and directory creation flow.
src/SharpCompress/Archives/IArchiveEntryExtensions.cs Increases extraction buffer size and uses larger FileStream buffers for file writes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +139 to +145
using var fs = new FileStream(
destinationFileName,
fm,
FileAccess.Write,
FileShare.None,
bufferSize: 1048576
); // 1MB buffer
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These FileStream constructors set bufferSize to 1MB, which typically causes a 1MB managed buffer allocation per extracted file (often LOH) in addition to the CopyTo buffer. This can lead to large transient allocations/LOH fragmentation when extracting many files. Consider relying on the default FileStream buffering or making the larger FileStream buffer size conditional/opt-in (and ideally reuse pooled buffers where possible).

Copilot uses AI. Check for mistakes.
Comment on lines +139 to +145
using var fs = new FileStream(
destinationFileName,
fm,
FileAccess.Write,
FileShare.None,
bufferSize: 1048576
); // 1MB buffer
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1MB literal buffer size is duplicated here even though the file already defines BufferSize. Using the constant would prevent accidental divergence if the buffer size is tuned later.

Copilot uses AI. Check for mistakes.
Comment on lines +173 to +181
// For remaining data, we must actually read it from the underlying stream
// Use very large reads to minimize LZMA decompression call overhead
var skipBuffer = ArrayPool<byte>.Shared.Rent(1048576); // 1MB for skipping
try
{
while (advanceAmount > 0 && BytesLeftToRead > 0)
{
var toRead = (int)
Math.Min(Math.Min(advanceAmount, BytesLeftToRead), skipBuffer.Length);
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SkipInternal always rents a 1MB buffer even when BytesLeftToRead is already 0 or the remaining skip amount is small (after consuming cached bytes). Consider returning early when there’s nothing left to read/skip, and renting a buffer sized to min(remainingSkip, 1MB) to reduce unnecessary large rentals for small skips.

Copilot uses AI. Check for mistakes.
Comment on lines +345 to +358
internal async ValueTask WriteAsync(Stream writeStream, CancellationToken cancellationToken)
{
#if LEGACY_DOTNET
using Stream s = await OpenEntryStreamAsync(cancellationToken).ConfigureAwait(false);
var sourceStream = WrapWithProgress(s, Entry);
await sourceStream.CopyToAsync(writeStream, 81920, cancellationToken).ConfigureAwait(false);
await sourceStream
.CopyToAsync(writeStream, 1048576, cancellationToken)
.ConfigureAwait(false); // 1MB buffer
#else
await using Stream s = await OpenEntryStreamAsync(cancellationToken).ConfigureAwait(false);
var sourceStream = WrapWithProgress(s, Entry);
await sourceStream.CopyToAsync(writeStream, 81920, cancellationToken).ConfigureAwait(false);
await sourceStream
.CopyToAsync(writeStream, 1048576, cancellationToken)
.ConfigureAwait(false); // 1MB buffer
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopyToAsync with a 1MB buffer will allocate a new 1MB buffer per call (LOH) in the framework implementation. For many-entry extractions this can create substantial GC/LOH pressure. Consider using a pooled-buffer async copy loop (ArrayPool) or keeping the buffer size under the LOH threshold and making the larger size opt-in/configurable.

Copilot uses AI. Check for mistakes.
public static class IArchiveEntryExtensions
{
private const int BufferSize = 81920;
private const int BufferSize = 1048576; // 1MB buffer for better disk I/O performance
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting BufferSize to 1MB affects CopyTo/CopyToAsync below, which will allocate a new 1MB byte[] per call (LOH) in the framework implementation. This can significantly increase GC pressure for archives with many (especially small) entries. Consider keeping the CopyTo buffer below the LOH threshold or using a pooled-buffer copy implementation so the larger buffer doesn’t allocate per extraction call.

Suggested change
private const int BufferSize = 1048576; // 1MB buffer for better disk I/O performance
private const int BufferSize = 81920; // 80KB buffer to avoid LOH allocations while maintaining good I/O performance

Copilot uses AI. Check for mistakes.
@julianxhokaxhiu
Copy link
Author

I'll take care of Copilot suggestions tomorrow. Thanks

@adamhathcock
Copy link
Owner

There's probably a few things here that might need to be separate PRs to be consumable.

  1. making buffer size an option (ideally pooling) which I want to do
  2. improvements for copying/Skip extension methods
  3. improvements to LZMA in general

I definitely want to do them, but just break them down. It helps me as I'm in the middle of a large refactor to get async working all the way.

I'll make a stab at #1 today

@adamhathcock
Copy link
Owner

I just realized this PR is against master which is currently dev. You probably want release so we can get these kind of fixes out faster

@julianxhokaxhiu julianxhokaxhiu changed the base branch from master to release January 27, 2026 08:26
@julianxhokaxhiu
Copy link
Author

I'll have to rework the entire patch then as the PR states there are 249 file changes now which is not what I want. I didn't knew release is the branch currently used to release current nuget artifacts. I'll update this to point again to master and refine the patch, and eventually see how we can backport that to release after if it's ok with you. Let me know, cheers

@julianxhokaxhiu julianxhokaxhiu changed the base branch from release to master January 27, 2026 08:29
@adamhathcock
Copy link
Owner

I think the release branch is now have buffer size centralized into a single static which should make things easier.

I'll back port this to master which should have further changes later

#1165

@julianxhokaxhiu
Copy link
Author

I'll close this PR then in favor of yours, so we keep this one only as a ref.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants