[WIP] Improve I/O operations while reading big solid archives#1163
[WIP] Improve I/O operations while reading big solid archives#1163julianxhokaxhiu wants to merge 1 commit intoadamhathcock:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Improves extraction performance for large solid 7z archives by reducing skip overhead, increasing I/O buffer sizes, and adding aggressive inlining in hot LZMA decode paths.
Changes:
- Increase stream copy/write buffer sizes to 1MB for faster disk and stream I/O.
- Optimize skipping on non-seekable streams, including a fast-path for
BufferedSubStream. - Add
AggressiveInliningattributes to selected LZMA range coder / decoder methods.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/SharpCompress/Readers/AbstractReader.cs | Uses larger CopyTo/CopyToAsync buffers when writing entry data. |
| src/SharpCompress/Polyfills/StreamExtensions.cs | Reworks Skip(long) to reduce read calls and adds a BufferedSubStream fast path. |
| src/SharpCompress/IO/BufferedSubStream.cs | Adds SkipInternal to skip efficiently using cache + large reads. |
| src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoderBit.cs | Adds inlining attribute to bit decode hot path. |
| src/SharpCompress/Compressors/LZMA/RangeCoder/RangeCoder.cs | Adds inlining attributes to decoder methods used heavily in LZMA. |
| src/SharpCompress/Compressors/LZMA/LzmaDecoder.cs | Adds inlining attributes to nested decoder hot paths. |
| src/SharpCompress/Common/ExtractionMethods.cs | Refactors destination path computation/security checks and directory creation flow. |
| src/SharpCompress/Archives/IArchiveEntryExtensions.cs | Increases extraction buffer size and uses larger FileStream buffers for file writes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| using var fs = new FileStream( | ||
| destinationFileName, | ||
| fm, | ||
| FileAccess.Write, | ||
| FileShare.None, | ||
| bufferSize: 1048576 | ||
| ); // 1MB buffer |
There was a problem hiding this comment.
These FileStream constructors set bufferSize to 1MB, which typically causes a 1MB managed buffer allocation per extracted file (often LOH) in addition to the CopyTo buffer. This can lead to large transient allocations/LOH fragmentation when extracting many files. Consider relying on the default FileStream buffering or making the larger FileStream buffer size conditional/opt-in (and ideally reuse pooled buffers where possible).
| using var fs = new FileStream( | ||
| destinationFileName, | ||
| fm, | ||
| FileAccess.Write, | ||
| FileShare.None, | ||
| bufferSize: 1048576 | ||
| ); // 1MB buffer |
There was a problem hiding this comment.
The 1MB literal buffer size is duplicated here even though the file already defines BufferSize. Using the constant would prevent accidental divergence if the buffer size is tuned later.
| // For remaining data, we must actually read it from the underlying stream | ||
| // Use very large reads to minimize LZMA decompression call overhead | ||
| var skipBuffer = ArrayPool<byte>.Shared.Rent(1048576); // 1MB for skipping | ||
| try | ||
| { | ||
| while (advanceAmount > 0 && BytesLeftToRead > 0) | ||
| { | ||
| var toRead = (int) | ||
| Math.Min(Math.Min(advanceAmount, BytesLeftToRead), skipBuffer.Length); |
There was a problem hiding this comment.
SkipInternal always rents a 1MB buffer even when BytesLeftToRead is already 0 or the remaining skip amount is small (after consuming cached bytes). Consider returning early when there’s nothing left to read/skip, and renting a buffer sized to min(remainingSkip, 1MB) to reduce unnecessary large rentals for small skips.
| internal async ValueTask WriteAsync(Stream writeStream, CancellationToken cancellationToken) | ||
| { | ||
| #if LEGACY_DOTNET | ||
| using Stream s = await OpenEntryStreamAsync(cancellationToken).ConfigureAwait(false); | ||
| var sourceStream = WrapWithProgress(s, Entry); | ||
| await sourceStream.CopyToAsync(writeStream, 81920, cancellationToken).ConfigureAwait(false); | ||
| await sourceStream | ||
| .CopyToAsync(writeStream, 1048576, cancellationToken) | ||
| .ConfigureAwait(false); // 1MB buffer | ||
| #else | ||
| await using Stream s = await OpenEntryStreamAsync(cancellationToken).ConfigureAwait(false); | ||
| var sourceStream = WrapWithProgress(s, Entry); | ||
| await sourceStream.CopyToAsync(writeStream, 81920, cancellationToken).ConfigureAwait(false); | ||
| await sourceStream | ||
| .CopyToAsync(writeStream, 1048576, cancellationToken) | ||
| .ConfigureAwait(false); // 1MB buffer |
There was a problem hiding this comment.
CopyToAsync with a 1MB buffer will allocate a new 1MB buffer per call (LOH) in the framework implementation. For many-entry extractions this can create substantial GC/LOH pressure. Consider using a pooled-buffer async copy loop (ArrayPool) or keeping the buffer size under the LOH threshold and making the larger size opt-in/configurable.
| public static class IArchiveEntryExtensions | ||
| { | ||
| private const int BufferSize = 81920; | ||
| private const int BufferSize = 1048576; // 1MB buffer for better disk I/O performance |
There was a problem hiding this comment.
Setting BufferSize to 1MB affects CopyTo/CopyToAsync below, which will allocate a new 1MB byte[] per call (LOH) in the framework implementation. This can significantly increase GC pressure for archives with many (especially small) entries. Consider keeping the CopyTo buffer below the LOH threshold or using a pooled-buffer copy implementation so the larger buffer doesn’t allocate per extraction call.
| private const int BufferSize = 1048576; // 1MB buffer for better disk I/O performance | |
| private const int BufferSize = 81920; // 80KB buffer to avoid LOH allocations while maintaining good I/O performance |
|
I'll take care of Copilot suggestions tomorrow. Thanks |
|
There's probably a few things here that might need to be separate PRs to be consumable.
I definitely want to do them, but just break them down. It helps me as I'm in the middle of a large refactor to get async working all the way. I'll make a stab at #1 today |
|
I just realized this PR is against |
|
I'll have to rework the entire patch then as the PR states there are 249 file changes now which is not what I want. I didn't knew |
|
I think the release branch is now have buffer size centralized into a single static which should make things easier. I'll back port this to master which should have further changes later |
|
I'll close this PR then in favor of yours, so we keep this one only as a ref. |
This PR is a tentative effort to improve the extraction time for 7z archives, especially ones that are compressed as 1 block solid using 16MB dictionary.
The main idea here is to reduce the number of access to cache ( while doing Skip ), improve performance in writing files to the disk ( use 1MB buffer by default ) and aggressively inline codepaths that are reused across executions ( avoiding jumps in the code but instead favoring a more linear execution at the expense of code size ).
Feel free to test it and report back. In my own case I can finally see the extraction of files in the archive mentioned in #1105
Below Claude Sonnet 4.5 summary
Performance Optimization Summary for 7Zip Solid Archive Extraction
Problem
7Zip extraction with large solid archives (16MB dictionaries, 1-block compression) was extremely slow in version 0.42.0+ compared to 0.41.0, taking hours instead of minutes on high-end hardware.
Root Causes Identified Through Profiling
Optimizations Implemented
1. Skip Operation Optimization
File: StreamExtensions.cs
ReadOnlySubStream + CopyTo(Stream.Null)(byte-by-byte) to buffered readingBufferedSubStreamto skip via internal methodFile: BufferedSubStream.cs
SkipInternal()method with 1MB buffer for efficient skipping2. Increased I/O Buffer Sizes (80KB → 1MB)
Files Modified:
Rationale: Modern NVMe Gen 5 drives benefit significantly from larger sequential I/O operations
3. FileStream Optimization
File: IArchiveEntryExtensions.cs
File.Open()with explicitFileStreamconstructoruseAsync: truefor async operations to enable overlapped I/O on Windows4. Path Processing Optimization
File: ExtractionMethods.cs
entry.Keyto avoid multiple property accessesCombine(folder, file)instead of separate operations5. LZMA Decompression Micro-optimizations
Files Modified:
Added
[MethodImpl(MethodImplOptions.AggressiveInlining)]to hot-path methods:RangeCoder.Decoder: GetThreshold, Decode, DecodeBit, DecodeDirectBitsBitDecoder.Decode(called millions of times)LzmaDecoder.LenDecoder.DecodeLzmaDecoder.Decoder2: DecodeNormal, DecodeWithMatchByteLzmaDecoder.LiteralDecoder: GetState, DecodeNormal, DecodeWithMatchByteRationale: These methods are in tight decompression loops; inlining reduces call overhead and enables cross-method JIT optimizations
Performance Results
Technical Notes
ArrayPool<byte>to minimize GC pressureTesting
Tested with 700MB+ solid 7zip archive with 16MB dictionary on AMD 9800X3D with NVMe Gen 5 drive. Extraction time improved from hours to minutes, matching performance of version 0.41.0.