-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Reduce RowHash's tag space size by x2 #3543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
terrelln
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good!
I did a quick benchmark and see just about neutral performance, +- a few percent based on the compiler. Also a small ratio loss, but it seems reasonable. I'll let you do the full benchmarks, just wanted to verify that the speed looked good on my machine too.
lib/compress/zstd_lazy.c
Outdated
|
|
||
| /* ZSTD_row_nextIndex(): | ||
| * Returns the next index to insert at within a tagTable row, and updates the "head" | ||
| * value to reflect the update. Essentially cycles backwards from [0, {entries per row}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should update this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I did, thank you for noticing!
lib/compress/zstd_lazy.c
Outdated
| { | ||
| U32* const hashTable = ms->hashTable; | ||
| U16* const tagTable = ms->tagTable; | ||
| U8* const tagTable = ms->tagTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use BYTE here and elsewhere
lib/compress/zstd_lazy.c
Outdated
|
|
||
| assert(hash == ZSTD_hashPtr(base + updateStartIdx, hashLog + ZSTD_ROW_HASH_TAG_BITS, mls)); | ||
| ((BYTE*)tagRow)[pos + ZSTD_ROW_HASH_TAG_OFFSET] = hash & ZSTD_ROW_HASH_TAG_MASK; | ||
| ((BYTE*)tagRow)[pos] = hash & ZSTD_ROW_HASH_TAG_MASK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can get rid of the BYTE* cast now.
|
@terrelln - did you benchmark on a mac or on an x86 server? |
cc96945 to
faf5f15
Compare
|
Fixed CR comments and added a spreadsheet for benchmarks. I've tried reducing this regression, but don't have a good solution at the moment and unless it's critical I'll not continue. |
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index). In turn, we slash the memory usage for slightly worse compression ratio or better ratio if we use the same memory size with a higher hashLog.
faf5f15 to
69be424
Compare
x86 |
terrelln
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
…ys invalid facebook#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag. Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
…ys invalid facebook#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag. Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag. Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
|
Hi, can this be reflected in the comment at https://github.com/facebook/zstd/blame/25822342be59d831bad65426ae51f5cc22157b09/lib/compress/zstd_lazy.c#L1116-L1118 Thanks! |
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index).
The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).
Update:
CR comments fixed.
Benchmark in spreadsheet.
Update 2:
Another interesting benchmark is
fullbench -b41 -l9 enwik8.1kwhich repeatedly usesZSTD_compressStreamto recompress 1024 bytes out of enwik8. Since #3528 hasn't been deployed yet, we can observe that this PR makes the benchmark twice as fast for small data sizes as the majority of time is spent in the tag space intiailization.