-
Notifications
You must be signed in to change notification settings - Fork 0
Introduce salt into row hash (#3528) - only part 2 #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: init-once-memory-3528-part1
Are you sure you want to change the base?
Introduce salt into row hash (#3528) - only part 2 #13
Conversation
- Adds memory type that is guaranteed to have been initialized at least once in the workspace's lifetime. - Changes tag space in row hash to be based on init once memory. - Moves buffers to aligned memory and removes the buffer memory type.
…o allocate them after aligned buffers
0273206 to
e18c951
Compare
| switch(mls) | ||
| { | ||
| default: | ||
| case 4: return ZSTD_hash4PtrS(p, hBits, (U32)hashSalt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather recommend to use the upper part of hashSalt ((U32)(hashSalt >> 32)),
as it's less predictable than the lower part.
An alternative approach could be to make hashSalt evolution of lower bits less predictable.
lib/compress/zstd_compress.c
Outdated
| /* We want to generate a new salt in case we reset a Cctx, but we always want to use | ||
| * 0 when we reset a Cdict */ | ||
| if(forWho == ZSTD_resetTarget_CCtx) { | ||
| ms->hashSalt = ms->hashSalt * 6364136223846793005 + 1; /* based on MUSL rand */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is probably enough time here to make hashSalt evolution more chaotic, closer to a prng.
a8f72d2 to
029b84f
Compare
This helps to avoid regressions where consecutive compressions use the same tag space with similar data (running `zstd -b5e7 enwik8 -B128K` reproduces this regression).
d4dff59 to
93dcd83
Compare
lib/compress/zstd_compress.c
Outdated
| * 0 when we reset a Cdict */ | ||
| if(forWho == ZSTD_resetTarget_CCtx) { | ||
| ms->tagTable = (U16 *) ZSTD_cwksp_reserve_aligned_init_once(ws, tagTableSize); | ||
| ms->hashSalt = (U64) ZSTD_hashPtr(&ms->hashSalt, 32, sizeof(ms->hashSalt)) << 32 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather go for a stronger mixing, making observation of any portion of bits essentially random (including lower bits).
I would recommend to base it on XXH3_rrmxmx() :
https://github.com/Cyan4973/xxHash/blob/v0.8.1/xxhash.h#L3403
This mixing has been tested and proved strong enough to pass practrand tests,
making it essentially as strong as a PCG random number generator.
93dcd83 to
2543295
Compare
lib/common/bits.h
Outdated
| return (value >> count) | (U64)(value << ((0U - count) & 0x3F)); | ||
| } | ||
|
|
||
| FORCE_INLINE_TEMPLATE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: do you need all 3 variants ?
Only ZSTD_rotateRight_U64() seems in use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All 3 variants are currently in use in ZSTD_row_matchMaskGroupWidth, I opted to moving all of them here from zstd_lazy.c instead of just one as it's the more natural place for them in any case.
70d69dc to
f4aab97
Compare
8f2280c to
a8c62ff
Compare
- Adds memory type that is guaranteed to have been initialized at least once in the workspace's lifetime. - Changes tag space in row hash to be based on init once memory.
This helps to avoid regressions where consecutive compressions use the same tag space with similar data (running
zstd -b5e7 enwik8 -B128Kreproduces this regression).