Skip to content

Conversation

@afourney
Copy link
Member

@afourney afourney commented Mar 12, 2025

This PR substantially increases test coverage by refactoring tests such that each file is tested through each entry point (local_file, streaming, and by url), on both the CLI and the Python module.

This is critical because different entry points convey different information about the file being converted. For example:

  • if a local file is converted, the filename and extension is known, but not the mime type or character encoding
  • if a url is converted, the filename may not be known, but the mimetype and charset are often in the response header
  • if a stream (e.g, stdin) is being converted, nothing might be known from the stream

Prior test coverage failed to check all these combinations, leading to several prior bugs.

@afourney afourney marked this pull request as ready for review March 12, 2025 05:14
@afourney afourney requested a review from gagb March 12, 2025 05:15
@afourney
Copy link
Member Author

Linking the special casing of 'test_mskanji.csv' in the stream tests to: google/magika#983

Confirmed via MD5 sum, and minimal test reproduction (see google/magika#983) that the correct bytes are being sent.

@afourney afourney merged commit 5f75e16 into main Mar 12, 2025
3 checks passed
@afourney afourney deleted the refactor_tests branch March 12, 2025 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants