Add System.Formats.Tar test assets #238

carlossanlop · 2022-04-06T20:56:35Z

@jozkee @adamsitnik @jeffhandley @ericstj

Now that the tar API proposal has been approved, I would like to add some test assets, which are a pre-requisite for my main PR in dotnet/runtime.

This PR adds:

A new folder under src/ for System.Formats.Tar.TestData.
A bash script that generates the tar files.
A bunch of unarchived files, which are generated by the script.
A set of *.tar (uncompressed) and *.tar.gz (compressed with Gzip) files that are also generated by the script, by collecting the unarchived files.

The script generates files for all the supported formats:

V7
Ustar
Pax
Pax with a Global Extended Attributes entry at the beginning
GNU (the "old" format supported by the tar tool)
GNU (the current format supported by the tar tool)

These are the test data sets that are generated by archiving files under the unarchived folder (the tests that consume these tar/tar.gz files can compare the contents of the archive with the contents of the unarchived folder):

file: Creates a text file.
folder_file: Creates a folder, then a text file inside that folder.
folder_file_utf8: Creates a folder, then a text file with utf8 characters in its filename.
folder_subfolder_file: Creates a folder, then a subfolder inside that folder, then a text file inside the subfolder.
folder_symlink_folder_subfolder_file: Creates a folder, then a subfolder inside that folder, then a text file inside that subfolder. Then at the root, creates a directory symlink to the subfolder.
file_symlink: Creates a text file, then a symlink to that file.
file_hardlink: Creates a text file, then a hardlink that points to the inode of the first file. Since it's not possible to differentiate between a normal file and a hardlink file (they are all hardlinks) the tar tool seems to generate them all as "regular files".
many_small_files: Creates 10 folders, then creates 10 text files inside each folder.

These are the test data sets that are generated by generating the necessary files on the fly, then archiving them, because either git or nuget do not support those filetypes (The tests that consume these tar/tar.gz files need to manually verify the expected contents, instead of comparing with the unarchived folder contents):

file_longsymlink: Creates a text file with a very long name, then a symlink to that file.
longfilename_over100_under255: Creates a text file whose filename is over 100 bytes but under 255 bytes.
longpath_over255: Creates a text file whose filename is over 255 bytes.
longpath_splitable_under255: Creates a folder with a name length of 98 bytes (it excludes the separator and the null char), then creates a text file under that folder with a filename length of 99 bytes (excludes the null char).
specialfiles: Creates a fifo, a character device, and a block device.

src/System.Formats.Tar.TestData/System.Formats.Tar.TestData.csproj

jeffhandley · 2022-04-06T22:53:00Z

src/System.Formats.Tar.TestData/unarchived/file/file.txt

@@ -0,0 +1 @@
+Hello file


Instead of "Hello" in each file, could this have a single-line description of what scenario the file represents?

The reason why I did it like this was so I could verify the expected file contents with a simple enough message that applied to all tests.

What do you think about putting a description of each test case in the readme instead?

jeffhandley · 2022-04-06T22:54:55Z

src/System.Formats.Tar.TestData/GenerateTarFiles.sh

+
+    # Tar was executed elevated, need to ensure the
+    # generated archives are readable by current user
+    ResetOwnership $TargetDir


We might consider adding some error handling in here to ensure the ownership is reset if something fails above.

I added instructions indicating that the script needs to be executed by a user with sudo permission.

Also, this script isn't expected to be executed often. But if it needs to be, then it would be very noticeable if something didn't go right by looking at the git status.

src/System.Formats.Tar.TestData/unarchived/many_small_files/9/9.txt

jeffhandley · 2022-04-06T23:00:57Z

src/System.Formats.Tar.TestData/unarchived/many_small_files/9/8.txt

@@ -0,0 +1 @@
+Hello many_small_files


Are there any behaviors of the implementation where having behavior needs to be tested for larger files, different content types, newlines, non-ascii characters or other encoding scenarios, or anything else like that?

The contents of the file don't really matter much. The header stores the length in bytes of the data section, and the data section can contain anything in that space of the specified size.

But I can add tests (no need to add test assets) that verify that string fields do not have forbidden characters. For example, the Name field should not have an endline.

…ers as inner text.

carlossanlop · 2022-04-13T05:24:25Z

@jeffhandley @adamsitnik @jozkee can I get a new review? I addressed some of the comments and I prefer to not address a couple others.

wfurt · 2022-04-13T05:48:05Z

it may be beyond this but it may be nice to also have samples with:

extended attributes
same files that differed in case - to check unpack on case insensitive systems
files with special characters - like ':' for Windows spaces, '$', '' '/' etc
non UTF-8 localized files
permissions beyond 777 e.g. s & t

may be hard/expensive but files bigger than MaxInt/MaxUint.

carlossanlop · 2022-04-13T06:30:15Z

Those are great suggestions, @wfurt. I can add those but I would like to do it in a separate PR so I don't keep blocking the main runtime PR.

carlossanlop requested review from adamsitnik and jozkee April 6, 2022 21:01

carlossanlop self-assigned this Apr 6, 2022

jeffhandley reviewed Apr 6, 2022

View reviewed changes

carlossanlop mentioned this pull request Apr 12, 2022

Implement Tar APIs dotnet/runtime#67883

Merged

carlossanlop and others added 3 commits April 12, 2022 09:51

System.Formats.Tar

93c4ef9

Add README.md and small comment adjustments in script.

d647bf6

The files of many_small_files should contain the folder and file numb…

1919299

…ers as inner text.

carlossanlop requested a review from jeffhandley April 13, 2022 05:24

carlossanlop added 2 commits April 12, 2022 22:29

Fix readme comment.

ec0aa32

Updated batch of tar and tar.gz files with newest fixes

fd224c2

jeffhandley approved these changes Apr 13, 2022

View reviewed changes

carlossanlop merged commit c7d5591 into dotnet:main Apr 13, 2022

carlossanlop deleted the SystemFormatTarAssets branch April 13, 2022 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add System.Formats.Tar test assets #238

Add System.Formats.Tar test assets #238

Uh oh!

carlossanlop commented Apr 6, 2022 •

edited

Loading

Uh oh!

Uh oh!

jeffhandley Apr 6, 2022

Uh oh!

carlossanlop Apr 12, 2022

Uh oh!

jeffhandley Apr 6, 2022

Uh oh!

carlossanlop Apr 13, 2022 •

edited

Loading

Uh oh!

Uh oh!

jeffhandley Apr 6, 2022

Uh oh!

carlossanlop Apr 12, 2022

Uh oh!

carlossanlop commented Apr 13, 2022

Uh oh!

wfurt commented Apr 13, 2022

Uh oh!

carlossanlop commented Apr 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1 @@
		Hello file No newline at end of file

Add System.Formats.Tar test assets #238

Add System.Formats.Tar test assets #238

Uh oh!

Conversation

carlossanlop commented Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jeffhandley Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

carlossanlop Apr 12, 2022

Choose a reason for hiding this comment

Uh oh!

jeffhandley Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

carlossanlop Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeffhandley Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

carlossanlop Apr 12, 2022

Choose a reason for hiding this comment

Uh oh!

carlossanlop commented Apr 13, 2022

Uh oh!

wfurt commented Apr 13, 2022

Uh oh!

carlossanlop commented Apr 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

carlossanlop commented Apr 6, 2022 •

edited

Loading

carlossanlop Apr 13, 2022 •

edited

Loading