-
Notifications
You must be signed in to change notification settings - Fork 955
Fix use-after-free when active expiration triggers hashtable to shrink #2257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix use-after-free when active expiration triggers hashtable to shrink #2257
Conversation
Signed-off-by: Yakov Gusakov <[email protected]>
a6f17a1 to
3f9e7dd
Compare
|
Good finding! Maybe it is the fix for these two:
Both of these seem to be similar crashes in hashtableScanDefrag. Do you think they are the same? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #2257 +/- ##
============================================
+ Coverage 71.47% 71.52% +0.05%
============================================
Files 123 123
Lines 66903 66904 +1
============================================
+ Hits 47816 47855 +39
+ Misses 19087 19049 -38
🚀 New features to boost your workflow:
|
I think it would be better if we added it, the test look ok to me, please find a plcae and add it. I think it's understandable that the defrag test is slower or takes longer. |
These two errors do look very similar. Probably a different flow in the ones you linked since the memory was cleared, which is why we see an access to 0x4 trying to call |
Sure, no problem. I looked through the test files and didn’t see an obvious place for it. I could add it to Do you have a preference? |
|
@gusakovy This is a great finding and fix. It is not really about defrag. (We need to rename the function hashtableScanDefrag because it is misleading.) When it is called without defrag callbacks it is just doing the scan. The problem is about scan + shrink. Your test with expire looks good to me. Put it in expire.tcl. It only tests that it doesn't crash. I can't see how it can be done with only the SCAN command (what we test in scan.tcl) because it doesn't delete keys in the callback. Another good place to put a test like this is the unit tests, src/unit/test_hashtable.c. |
|
@zuiderkwast Thanks for the help. After some more testing, I found that in its current form the test causes another ASAN error. I ran it in a loop with the fix applied and it sometimes causes a different memory leak. Not sure what causes this--maybe some edge case when doing a lot of short expiries: I think its best to just merge the fix and maybe open a separate issue for this. |
zuiderkwast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'll merge this without the test case for now.
The leak uncovered by your test case is very interesting though. Let's continue and track this down next.
Fixes #2271 When we shrink a hash table and it is empty, we do it without iterating over it to rehash the entries. However, there may still be empty child buckets (`used[0]==0 && child_buckets[0]!=0`). These were leaked in this case. This fix is to check for child buckets and don't skip the incremental rehashing if any child buckets exist. The incremental rehashing pass will free them. An additional fix is to compact bucket chains in scan when the scan callback has deleted some entries. This was already implemented for the case when rehashing is ongoing but it was missing in the case rehashing is not ongoing. Additionally, a test case for #2257 was added. --------- Signed-off-by: yzc-yzc <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Co-authored-by: Yakov Gusakov <[email protected]>
|
Will it have the same problem when I call dictScanDefrag and dictShrinkIfNeeded? |
…y-io#2257) **Current state** During `hashtableScanDefrag`, rehashing is paused to prevent entries from moving, but the scan callback can still delete entries which triggers `hashtableShrinkIfNeeded`. For example, the `expireScanCallback` can delete expired entries. **Issue** This can cause the table to be resized and the old memory to be freed while the scan is still accessing it, resulting in the following memory access violation: ``` [err]: Sanitizer error: ================================================================= ==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0 READ of size 1 at 0x611000003100 thread T0 #0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422 #1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768 #2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729 #3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402 #4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297 #5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269 #6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577 #7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370 valkey-io#8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513 valkey-io#9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543 valkey-io#10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291 valkey-io#11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139) valkey-io#12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39) 0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200) freed by thread T0 here: #0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5) #1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400 #2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415 #3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456 #4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656 #5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272 #6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448 #7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459 valkey-io#8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847 valkey-io#9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490 valkey-io#10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831 valkey-io#11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844 valkey-io#12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70 valkey-io#13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139 valkey-io#14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770 valkey-io#15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729 valkey-io#16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402 valkey-io#17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297 valkey-io#18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269 valkey-io#19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577 valkey-io#20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370 valkey-io#21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513 valkey-io#22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543 valkey-io#23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291 valkey-io#24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139) previously allocated by thread T0 here: #0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753) #1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214 #2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257 #3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645 #4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272 #5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448 #6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459 #7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847 valkey-io#8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490 valkey-io#9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831 valkey-io#10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844 valkey-io#11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70 valkey-io#12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139 valkey-io#13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770 valkey-io#14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729 valkey-io#15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402 valkey-io#16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297 valkey-io#17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269 valkey-io#18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577 valkey-io#19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370 valkey-io#20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513 valkey-io#21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543 valkey-io#22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291 valkey-io#23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139) SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled Shadow bytes around the buggy address: 0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd 0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa =>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==46774==ABORTING ``` **Solution** Suggested solution is to also pause auto shrinking during `hashtableScanDefrag`. I noticed that there was already a `hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that. **Testing** I created a simple tcl test that (most of the times) triggers this error, but it's a little clunky so I didn't add it as part of the PR: ``` start_server {tags {"expire hashtable defrag"}} { test {hashtable scan defrag on expiry} { r config set hz 100 set num_keys 20 for {set i 0} {$i < $num_keys} {incr i} { r set "key_$i" "value_$i" } for {set j 0} {$j < 50} {incr j} { set expire_keys 100 for {set i 0} {$i < $expire_keys} {incr i} { # Short expiry time to ensure they expire quickly r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}" } # Verify keys are set set initial_size [r dbsize] assert_equal $initial_size [expr $num_keys + $expire_keys] after 150 for {set i 0} {$i < 10} {incr i} { r get "expire_key_${i}_${j}" after 10 } } set remaining_keys [r dbsize] assert_equal $remaining_keys $num_keys # Verify server is still responsive assert_equal [r ping] {PONG} } {} } ``` Compiling with ASAN using `make noopt SANITIZER=address valkey-server` and running the test causes error above. Applying the fix resolves the issue. Signed-off-by: Yakov Gusakov <[email protected]>
Fixes valkey-io#2271 When we shrink a hash table and it is empty, we do it without iterating over it to rehash the entries. However, there may still be empty child buckets (`used[0]==0 && child_buckets[0]!=0`). These were leaked in this case. This fix is to check for child buckets and don't skip the incremental rehashing if any child buckets exist. The incremental rehashing pass will free them. An additional fix is to compact bucket chains in scan when the scan callback has deleted some entries. This was already implemented for the case when rehashing is ongoing but it was missing in the case rehashing is not ongoing. Additionally, a test case for valkey-io#2257 was added. --------- Signed-off-by: yzc-yzc <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Co-authored-by: Yakov Gusakov <[email protected]>
**Current state**
During `hashtableScanDefrag`, rehashing is paused to prevent entries
from moving, but the scan callback can still delete entries which
triggers `hashtableShrinkIfNeeded`. For example, the
`expireScanCallback` can delete expired entries.
**Issue**
This can cause the table to be resized and the old memory to be freed
while the scan is still accessing it, resulting in the following memory
access violation:
```
[err]: Sanitizer error: =================================================================
==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0
READ of size 1 at 0x611000003100 thread T0
#0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422
#1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768
#2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
#12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39)
0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200)
freed by thread T0 here:
#0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5)
#1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400
#2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415
#3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456
#4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656
#5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
previously allocated by thread T0 here:
#0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753)
#1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214
#2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257
#3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645
#4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
#9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
#10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
#11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
#12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
#13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
#14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
#21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
#22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
#23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled
Shadow bytes around the buggy address:
0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
=>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==46774==ABORTING
```
**Solution**
Suggested solution is to also pause auto shrinking during
`hashtableScanDefrag`. I noticed that there was already a
`hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but
it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that.
**Testing**
I created a simple tcl test that (most of the times) triggers this
error, but it's a little clunky so I didn't add it as part of the PR:
```
start_server {tags {"expire hashtable defrag"}} {
test {hashtable scan defrag on expiry} {
r config set hz 100
set num_keys 20
for {set i 0} {$i < $num_keys} {incr i} {
r set "key_$i" "value_$i"
}
for {set j 0} {$j < 50} {incr j} {
set expire_keys 100
for {set i 0} {$i < $expire_keys} {incr i} {
# Short expiry time to ensure they expire quickly
r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}"
}
# Verify keys are set
set initial_size [r dbsize]
assert_equal $initial_size [expr $num_keys + $expire_keys]
after 150
for {set i 0} {$i < 10} {incr i} {
r get "expire_key_${i}_${j}"
after 10
}
}
set remaining_keys [r dbsize]
assert_equal $remaining_keys $num_keys
# Verify server is still responsive
assert_equal [r ping] {PONG}
} {}
}
```
Compiling with ASAN using `make noopt SANITIZER=address valkey-server`
and running the test causes error above. Applying the fix resolves the
issue.
Signed-off-by: Yakov Gusakov <[email protected]>
Fixes #2271 When we shrink a hash table and it is empty, we do it without iterating over it to rehash the entries. However, there may still be empty child buckets (`used[0]==0 && child_buckets[0]!=0`). These were leaked in this case. This fix is to check for child buckets and don't skip the incremental rehashing if any child buckets exist. The incremental rehashing pass will free them. An additional fix is to compact bucket chains in scan when the scan callback has deleted some entries. This was already implemented for the case when rehashing is ongoing but it was missing in the case rehashing is not ongoing. Additionally, a test case for #2257 was added. --------- Signed-off-by: yzc-yzc <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Co-authored-by: Yakov Gusakov <[email protected]>
Current state
During
hashtableScanDefrag, rehashing is paused to prevent entries from moving, but the scan callback can still delete entries which triggershashtableShrinkIfNeeded. For example, theexpireScanCallbackcan delete expired entries.Issue
This can cause the table to be resized and the old memory to be freed while the scan is still accessing it, resulting in the following memory access violation:
Solution
Suggested solution is to also pause auto shrinking during
hashtableScanDefrag. I noticed that there was already ahashtablePauseAutoShrinkmethod andpause_auto_shrinkcounter, but it wasn't actually used inhashtableShrinkIfNeededso I fixed that.Testing
I created a simple tcl test that (most of the times) triggers this error, but it's a little clunky so I didn't add it as part of the PR:
Compiling with ASAN using
make noopt SANITIZER=address valkey-serverand running the test causes error above. Applying the fix resolves the issue.