Skip to content

Conversation

@Cyan4973
Copy link
Contributor

@Cyan4973 Cyan4973 commented Dec 21, 2022

fix #3228

In situations where the source's alphabet size is very small, the evaluation of literal costs from the Optimal Parser is initially incorrect. It takes some time to converge, during which compression is less efficient.
This is especially important for small files, since most of the parsing decisions would be based on incorrect metrics.

After this patch, the scenario ##3228 is fixed,
delivering the expected 29 bytes compressed size (smallest known compressed size, down from 54).

On other "regular" data, this patch tends to be generally positive, though this is an average, and differences remain pretty small.
The patch seems to impact text data more, likely because it prunes non present alphabet symbols much faster.
On binary data with full alphabet, it's more balanced, and results typically vary by merely a few bytes (compared to dev), making it essentially a non-event.

Since this modification is only for high compression modes, the speed impact is insignificant.

fix #3328

In situations where the alphabet size is very small,
the evaluation of literal costs from the Optimal Parser is initially incorrect.
It takes some time to converge, during which compression is less efficient.
This is especially important for small files,
because there will not be enough data to converge,
so most of the parsing is selected based on incorrect metrics.

After this patch, the scenario ##3328 gets fixed,
delivering the expected 29 bytes compressed size (smallest known compressed size).
comparing level 19 to level 22 and expecting a stricter better result from level 22
is not that guaranteed,
because level 19 and 22 are very close to each other,
especially for small files,
so any noise in the final compression result
result in failing this test.

Level 22 could be compared to something much lower, like level 15,
But level 19 is required anyway, because there is a clamping test which depends on it.

Removed level 22, kept level 19
@Cyan4973 Cyan4973 merged commit 834fd07 into dev Jan 4, 2023
@Cyan4973 Cyan4973 deleted the fix3228 branch January 13, 2023 04:26
@Cyan4973 Cyan4973 mentioned this pull request Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimal parser edge case: Huffman alone is significantly better than any matches

4 participants