Bug: Memory exhaustion when auto-detecting a specific tar.lz archive

### Summary

When reading a specific `.tar.lz` file without providing an extension hint, the library attempts to auto-detect the format. This process incorrectly identifies the file as a `Tar` archive with a `LongLink` header, leading to an attempt to allocate a massive amount of memory (e.g., 20GB). This causes the application to either crash or fail to open the archive. Standard compression utilities can open this same file without any issues.

The root cause appears to be a lack of validation in `TarHeader.Read()` and its helper methods.

### Steps to Reproduce

1.  Use the library to open a specially crafted `.tar.lz` file.
2.  Do not specify `ReaderOptions.ExtensionHint`, forcing the library to auto-detect the archive type.
3.  The library will fail to open the file after a massive memory spike.

### Root Cause Analysis

The problem occurs because the auto-detection mechanism first tries to parse the file as a standard `Tar` archive. My file is a `.tar.lz`, but a byte at a specific offset is misinterpreted.

1.  In `TarHeader.Read()`, the code enters a loop to process headers.

    ```csharp
    internal bool Read(BinaryReader reader)
    {
        string? longName = null;
        string? longLinkName = null;
        var hasLongValue = true;
        byte[] buffer;
        EntryType entryType;

        do
        {
            buffer = ReadBlock(reader);

            if (buffer.Length == 0)
            {
                return false;
            }

            entryType = ReadEntryType(buffer);

            // In my file, the byte at offset 157 is misinterpreted as EntryType.LongLink
            if (entryType == EntryType.LongName)
            {
                longName = ReadLongName(reader, buffer); // <- THIS LINE
                continue;
            }
            else if (entryType == EntryType.LongLink)
            {
                longLinkName = ReadLongName(reader, buffer); // <- THIS LINE
                continue;
            }

            hasLongValue = false;
        } while (hasLongValue);
    //...
    }
    ```

2.  For my specific file, the byte at offset 157 (read as `entryType`) happens to match `EntryType.LongLink`. This triggers a call to `TarHeader.ReadLongName()`.

3.  Inside `ReadLongName()`, the `ReadSize(buffer)` method calculates an extremely large value for `nameLength` based on the misinterpreted header data. The subsequent call to `reader.ReadBytes(nameLength)` attempts to allocate a massive array without any sanity checks.

    ```csharp
    private string ReadLongName(BinaryReader reader, byte[] buffer)
    {
        var size = ReadSize(buffer); // Calculates a huge size
        var nameLength = (int)size;
        var nameBytes = reader.ReadBytes(nameLength); // <- ATTEMPTS HUGE ALLOCATION
        var remainingBytesToRead = BLOCK_SIZE - (nameLength % BLOCK_SIZE);

        // ...
        return ArchiveEncoding.Decode(nameBytes, 0, nameBytes.Length).TrimNulls();
    }
    ```

4.  The `BinaryReader.ReadBytes()` method directly allocates memory based on the provided `count`.

    ```csharp
    public virtual byte[] ReadBytes(int count)
    {
        ArgumentOutOfRangeException.ThrowIfNegative(count);
        ThrowIfDisposed();

        if (count == 0)
        {
            return Array.Empty<byte>();
        }

        byte[] result = new byte[count]; // <- HUGE MEMORY ALLOCATION HAPPENS HERE
        int numRead = _stream.ReadAtLeast(result, result.Length, throwOnEndOfStream: false);

        // ...
        return result;
    }
    ```

### Stream Corruption

After the `Tar` parsing attempt fails (likely due to an `EndOfStreamException` or I/O error from `Stream.ReadAtLeast()`), the underlying `Stream` or `SharpCompressStream` appears to be left in a corrupted state.

When the auto-detection logic proceeds to the correct `tar.lz` format, it fails to read the header correctly. For example, it does not see the "LZIP" magic bytes at the beginning of the stream, even though debugging shows the bytes are present in the buffer. This strongly suggests that the stream's internal position or state has been irrecoverably altered by the failed read attempt.

### Workaround

The issue can be avoided by explicitly setting `ReaderOptions.ExtensionHint` to guide the parser. This skips the problematic `Tar` auto-detection step.

```csharp
// Example workaround
var options = new ReaderOptions { ExtensionHint = "tar.lz" };
using (var archive = ArchiveFactory.Open(filePath, options))
{
    // ...
}
```

However, most users would expect the auto-detection to be robust and would not think to set this option unless they have investigated the source code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Memory exhaustion when auto-detecting a specific tar.lz archive #1021

Summary

Steps to Reproduce

Root Cause Analysis

Stream Corruption

Workaround

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: Memory exhaustion when auto-detecting a specific tar.lz archive #1021

Description

Summary

Steps to Reproduce

Root Cause Analysis

Stream Corruption

Workaround

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions