Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -202,19 +202,18 @@ internal static TarEntryType GetCorrectTypeFlagForFormat(TarEntryFormat format,
/// <summary>Parses a byte span that represents an ASCII string containing a number in octal base.</summary>
internal static T ParseOctal<T>(ReadOnlySpan<byte> buffer) where T : struct, INumber<T>
{
buffer = TrimEndingNullsAndSpaces(buffer);

T octalFactor = T.CreateTruncating(8u);
T value = T.Zero;
foreach (byte b in buffer)

// skip leading non-octal bytes
int offset = 0;
for (; offset < buffer.Length && (buffer[offset] < (byte)'0' || buffer[offset] > (byte)'7'); ++offset);

foreach (byte b in buffer.Slice(offset))
{
uint digit = (uint)(b - '0');
if (digit >= 8)
{
ThrowInvalidNumber();
}
if (b < (byte)'0' || b > (byte)'7') break;

value = checked((value * octalFactor) + T.CreateTruncating(digit));
value = checked((value * octalFactor) + T.CreateTruncating((uint)(b - '0')));
Comment on lines -212 to +216
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which assets have octal fields with non-octal characters on either end? I'm concerned that we are basically allowing garbage and not respecting the tar spec anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea let's not anywhere intentionally be more lenient than we discover existing tools are.

Copy link
Member Author

@am11 am11 Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danmoseley, it is exactly as lenient as libarchive's octal parsing, it allows any non-octal character at the beginning and end.

GNU tar and busybox implementations allow both leading and trailing whitespaces and nulls.

.NET's implementation is the strictest one, we are only ignoring null and spaces when they appear at the end, and we do not allow them at the beginning.

I haven't find formal specification calling out for and against these behaviors, so it is up to us to decide which existing implementation to follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But ignoring nulls and spaces is not the same as ignoring any character that is either smaller than '0' or larger than '7'.

Copy link
Member Author

@am11 am11 Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libarchive ignores any non-octal character. :)
https://github.com/libarchive/libarchive/blob/9e5279615033fa073d0bbae83b366359eab3b62f/contrib/untar.c#L45

if this is too lenient, I can confine it to only allow nulls and spaces.

}

return value;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Licensed to the .NET Foundation under one or more agreements.
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Collections.Generic;
Expand Down Expand Up @@ -125,28 +125,33 @@ public void Read_Archive_LongFileName_Over100_Under255(TarEntryFormat format, Te
public void Read_Archive_LongPath_Over255(TarEntryFormat format, TestTarFormat testFormat) =>
Read_Archive_LongPath_Over255_Internal(format, testFormat);

[Fact]
public void Read_NodeTarArchives_Successfully()
[Theory]
[InlineData("node-tar")]
[InlineData("tar-rs")]
public void Read_TestArchives_Successfully(string subset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be improved so that we can investigate failures much more easily.

Instead of iterating through the files inside the test method, you can pass the file names as a MemberData. Plus they can be reused.

        public static IEnumerable<object[]> GetNodeTarFiles()
        {
            string nodeTarPath = Path.Join(Directory.GetCurrentDirectory(), "tar", "note-tar");
            foreach (string file in Directory.EnumerateFiles(nodeTarPath, "*.tar", SearchOption.AllDirectories))
            {
                yield return new object[] { file };
            }
        }

        public static IEnumerable<object[]> GetTarRsFiles()
        {
            string nodeTarPath = Path.Join(Directory.GetCurrentDirectory(), "tar", "tar-rs");
            foreach (string file in Directory.EnumerateFiles(nodeTarPath, "*.tar", SearchOption.AllDirectories))
            {
                yield return new object[] { file };
            }
        }

     [Theory]
        [MemberData(nameof("GetNodeTarFiles"))]
        [MemberData(nameof("GetTarRsFiles"))]
        public void Read_TestArchives_Successfully(string subset)

Then you can remove the forloop inside the test method.

{
string nodeTarPath = Path.Join(Directory.GetCurrentDirectory(), "tar", "node-tar");
string nodeTarPath = Path.Join(Directory.GetCurrentDirectory(), "tar", subset);
foreach (string file in Directory.EnumerateFiles(nodeTarPath, "*.tar", SearchOption.AllDirectories))
{
using FileStream sourceStream = File.Open(file, FileMode.Open, FileAccess.Read, FileShare.Read);
using var reader = new TarReader(sourceStream);

TarEntry? entry = null;
while (true)
while ((entry = reader.GetNextEntry()) != null)
{
Exception ex = Record.Exception(() => entry = reader.GetNextEntry());
Assert.Null(ex);

if (entry is null) break;

ex = Record.Exception(() => entry.Name);
Assert.Null(ex);

ex = Record.Exception(() => entry.Length);
Assert.Null(ex);
Assert.NotNull(entry.Name);
Assert.True(Enum.IsDefined(entry.EntryType));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail if an entry of one of the node-tar and tar-rs asserts has an entry type that is not in our TarEntryType enum. I checked, and we pass the value directly without validating it is among the predefined enums:

typeFlag: (TarEntryType)buffer[FieldLocations.TypeFlag])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those files have known entry types, so this assert is valid for the current set of assets. If in the future, we ever updated assets in those directories from upstream into dotnet/runtime-assets, and those new files have some unknown entry types (very unlikely), we can adjust the code accordingly.

Assert.True(Enum.IsDefined(entry.Format));

if (entry.EntryType == TarEntryType.Directory)
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory entry check is unnecessary if the entry.DataStream is already being checked for null a few lines below.


var ds = entry.DataStream;
if (ds != null && ds.Length > 0)
{
using var memoryStream = new MemoryStream();
ds.CopyTo(memoryStream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is being done with the memory stream. What is this for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look into the implementation of the stream that Tar returns; I read it into a dummy stream to ensure it could successfully read to the end. It seems a reasonable small check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah alright, that's fine. Although we already have tests that verify both seekable and unseekable data streams can be read.

}
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,19 +20,6 @@ public void MalformedArchive_TooSmall()
Assert.Throws<EndOfStreamException>(() => reader.GetNextEntry());
}

[Fact]
public void MalformedArchive_HeaderSize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these tests removed?

Copy link
Contributor

@carlossanlop carlossanlop Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but I am concerned that we now allow a whole header to contain invalid data. I do not think we want this level of flexiblity.

The test that is being removed would've thrown at some point due to having 512 consecutive 0x1 bytes. Let's analyze each one of the fields in the header:

From V7:

  • name: This can be assumed to be valid since we store it as string and don't validate the value until we extract.
  • mode, uid, guid, size, mtime, checksum: they all expect octal numbers. With your change, they would not throw if there are non-octal chars.
  • linkname: Same as name.

From Ustar and Gnu:

  • typeflag: We should be ok with any character here, even if it's not among the ones defined in our enum (this should throw right now, but it's a bug unrelated to this change, we should allow any ASCII character Verified, we allow values outside of the enum).
  • magic: this should throw. We only have three possible values: either it's all nulls, or it's the Ustar magic, or it's the Gnu magic.
  • version: Similar to magic.
  • uname/gname: would not throw, we store this as a string.
  • devmajor/devminor: Octal, this would pass with your change.
  • prefix: Same as name and linkname.

Then since the size had some data, but we didn't find any characters, we would set size as 0 with your change. So no data.

{
using MemoryStream malformed = new MemoryStream();
byte[] buffer = new byte[512]; // Minimum length of any header
Array.Fill<byte>(buffer, 0x1);
malformed.Write(buffer);
malformed.Seek(0, SeekOrigin.Begin);

using TarReader reader = new TarReader(malformed);
Assert.Throws<FormatException>(() => reader.GetNextEntry());
}

[Fact]
public void EmptyArchive()
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,23 +39,6 @@ public async Task MalformedArchive_TooSmall_Async()
}
}

[Fact]
public async Task MalformedArchive_HeaderSize_Async()
{
await using (MemoryStream malformed = new MemoryStream())
{
byte[] buffer = new byte[512]; // Minimum length of any header
Array.Fill<byte>(buffer, 0x1);
malformed.Write(buffer);
malformed.Seek(0, SeekOrigin.Begin);

await using (TarReader reader = new TarReader(malformed))
{
await Assert.ThrowsAsync<FormatException>(async () => await reader.GetNextEntryAsync());
}
}
}

[Fact]
public async Task EmptyArchive_Async()
{
Expand Down