Skip to content

Conversation

@carlossanlop
Copy link
Contributor

@carlossanlop carlossanlop commented Apr 6, 2022

@jozkee @adamsitnik @jeffhandley @ericstj

Now that the tar API proposal has been approved, I would like to add some test assets, which are a pre-requisite for my main PR in dotnet/runtime.

This PR adds:

  • A new folder under src/ for System.Formats.Tar.TestData.
  • A bash script that generates the tar files.
  • A bunch of unarchived files, which are generated by the script.
  • A set of *.tar (uncompressed) and *.tar.gz (compressed with Gzip) files that are also generated by the script, by collecting the unarchived files.

The script generates files for all the supported formats:

  • V7
  • Ustar
  • Pax
  • Pax with a Global Extended Attributes entry at the beginning
  • GNU (the "old" format supported by the tar tool)
  • GNU (the current format supported by the tar tool)

These are the test data sets that are generated by archiving files under the unarchived folder (the tests that consume these tar/tar.gz files can compare the contents of the archive with the contents of the unarchived folder):

  • file: Creates a text file.
  • folder_file: Creates a folder, then a text file inside that folder.
  • folder_file_utf8: Creates a folder, then a text file with utf8 characters in its filename.
  • folder_subfolder_file: Creates a folder, then a subfolder inside that folder, then a text file inside the subfolder.
  • folder_symlink_folder_subfolder_file: Creates a folder, then a subfolder inside that folder, then a text file inside that subfolder. Then at the root, creates a directory symlink to the subfolder.
  • file_symlink: Creates a text file, then a symlink to that file.
  • file_hardlink: Creates a text file, then a hardlink that points to the inode of the first file. Since it's not possible to differentiate between a normal file and a hardlink file (they are all hardlinks) the tar tool seems to generate them all as "regular files".
  • many_small_files: Creates 10 folders, then creates 10 text files inside each folder.

These are the test data sets that are generated by generating the necessary files on the fly, then archiving them, because either git or nuget do not support those filetypes (The tests that consume these tar/tar.gz files need to manually verify the expected contents, instead of comparing with the unarchived folder contents):

  • file_longsymlink: Creates a text file with a very long name, then a symlink to that file.
  • longfilename_over100_under255: Creates a text file whose filename is over 100 bytes but under 255 bytes.
  • longpath_over255: Creates a text file whose filename is over 255 bytes.
  • longpath_splitable_under255: Creates a folder with a name length of 98 bytes (it excludes the separator and the null char), then creates a text file under that folder with a filename length of 99 bytes (excludes the null char).
  • specialfiles: Creates a fifo, a character device, and a block device.

@carlossanlop carlossanlop self-assigned this Apr 6, 2022
@@ -0,0 +1 @@
Hello file No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "Hello" in each file, could this have a single-line description of what scenario the file represents?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I did it like this was so I could verify the expected file contents with a simple enough message that applied to all tests.

What do you think about putting a description of each test case in the readme instead?


# Tar was executed elevated, need to ensure the
# generated archives are readable by current user
ResetOwnership $TargetDir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might consider adding some error handling in here to ensure the ownership is reset if something fails above.

Copy link
Contributor Author

@carlossanlop carlossanlop Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added instructions indicating that the script needs to be executed by a user with sudo permission.

Also, this script isn't expected to be executed often. But if it needs to be, then it would be very noticeable if something didn't go right by looking at the git status.

@@ -0,0 +1 @@
Hello many_small_files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any behaviors of the implementation where having behavior needs to be tested for larger files, different content types, newlines, non-ascii characters or other encoding scenarios, or anything else like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contents of the file don't really matter much. The header stores the length in bytes of the data section, and the data section can contain anything in that space of the specified size.

But I can add tests (no need to add test assets) that verify that string fields do not have forbidden characters. For example, the Name field should not have an endline.

@carlossanlop
Copy link
Contributor Author

@jeffhandley @adamsitnik @jozkee can I get a new review? I addressed some of the comments and I prefer to not address a couple others.

@wfurt
Copy link
Member

wfurt commented Apr 13, 2022

it may be beyond this but it may be nice to also have samples with:

  • extended attributes
  • same files that differed in case - to check unpack on case insensitive systems
  • files with special characters - like ':' for Windows spaces, '$', '' '/' etc
  • non UTF-8 localized files
  • permissions beyond 777 e.g. s & t

may be hard/expensive but files bigger than MaxInt/MaxUint.

@carlossanlop
Copy link
Contributor Author

Those are great suggestions, @wfurt. I can add those but I would like to do it in a separate PR so I don't keep blocking the main runtime PR.

@carlossanlop carlossanlop merged commit c7d5591 into dotnet:main Apr 13, 2022
@carlossanlop carlossanlop deleted the SystemFormatTarAssets branch April 13, 2022 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants