Commit ad2c6b0
authored
Fix hashtablePauseAutoShrink and use it in hashtableScanDefrag (valkey-io#2257)
**Current state**
During `hashtableScanDefrag`, rehashing is paused to prevent entries
from moving, but the scan callback can still delete entries which
triggers `hashtableShrinkIfNeeded`. For example, the
`expireScanCallback` can delete expired entries.
**Issue**
This can cause the table to be resized and the old memory to be freed
while the scan is still accessing it, resulting in the following memory
access violation:
```
[err]: Sanitizer error: =================================================================
==46774==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000003100 at pc 0x0000004704d3 bp 0x7fffcb062000 sp 0x7fffcb061ff0
READ of size 1 at 0x611000003100 thread T0
#0 0x4704d2 in isPositionFilled /home/gusakovy/Projects/valkey/src/hashtable.c:422
#1 0x478b45 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1768
#2 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
#3 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
#4 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
#5 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
#6 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
#7 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
#8 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
valkey-io#9 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
valkey-io#10 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
valkey-io#11 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
valkey-io#12 0x452e39 in _start (/local/home/gusakovy/Projects/valkey/src/valkey-server+0x452e39)
0x611000003100 is located 0 bytes inside of 256-byte region [0x611000003100,0x611000003200)
freed by thread T0 here:
#0 0x7f471a34a1e5 in __interceptor_free (/lib64/libasan.so.4+0xd81e5)
#1 0x4aefbc in zfree_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:400
#2 0x4aeff5 in valkey_free /home/gusakovy/Projects/valkey/src/zmalloc.c:415
#3 0x4707d2 in rehashingCompleted /home/gusakovy/Projects/valkey/src/hashtable.c:456
#4 0x471b5b in resize /home/gusakovy/Projects/valkey/src/hashtable.c:656
#5 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#6 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#7 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#8 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
valkey-io#9 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
valkey-io#10 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
valkey-io#11 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
valkey-io#12 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
valkey-io#13 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
valkey-io#14 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
valkey-io#15 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
valkey-io#16 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
valkey-io#17 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
valkey-io#18 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
valkey-io#19 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
valkey-io#20 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
valkey-io#21 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
valkey-io#22 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
valkey-io#23 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
valkey-io#24 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
previously allocated by thread T0 here:
#0 0x7f471a34a753 in __interceptor_calloc (/lib64/libasan.so.4+0xd8753)
#1 0x4ae48c in ztrycalloc_usable_internal /home/gusakovy/Projects/valkey/src/zmalloc.c:214
#2 0x4ae757 in valkey_calloc /home/gusakovy/Projects/valkey/src/zmalloc.c:257
#3 0x4718fc in resize /home/gusakovy/Projects/valkey/src/hashtable.c:645
#4 0x475bff in hashtableShrinkIfNeeded /home/gusakovy/Projects/valkey/src/hashtable.c:1272
#5 0x47704b in hashtablePop /home/gusakovy/Projects/valkey/src/hashtable.c:1448
#6 0x47716f in hashtableDelete /home/gusakovy/Projects/valkey/src/hashtable.c:1459
#7 0x480038 in kvstoreHashtableDelete /home/gusakovy/Projects/valkey/src/kvstore.c:847
#8 0x50c12c in dbGenericDeleteWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:490
valkey-io#9 0x515f28 in deleteExpiredKeyAndPropagateWithDictIndex /home/gusakovy/Projects/valkey/src/db.c:1831
valkey-io#10 0x516103 in deleteExpiredKeyAndPropagate /home/gusakovy/Projects/valkey/src/db.c:1844
valkey-io#11 0x6d8642 in activeExpireCycleTryExpire /home/gusakovy/Projects/valkey/src/expire.c:70
valkey-io#12 0x6d8706 in expireScanCallback /home/gusakovy/Projects/valkey/src/expire.c:139
valkey-io#13 0x478bd8 in hashtableScanDefrag /home/gusakovy/Projects/valkey/src/hashtable.c:1770
valkey-io#14 0x4789c2 in hashtableScan /home/gusakovy/Projects/valkey/src/hashtable.c:1729
valkey-io#15 0x47e3ca in kvstoreScan /home/gusakovy/Projects/valkey/src/kvstore.c:402
valkey-io#16 0x6d9040 in activeExpireCycle /home/gusakovy/Projects/valkey/src/expire.c:297
valkey-io#17 0x4859d2 in databasesCron /home/gusakovy/Projects/valkey/src/server.c:1269
valkey-io#18 0x486e92 in serverCron /home/gusakovy/Projects/valkey/src/server.c:1577
valkey-io#19 0x4637dd in processTimeEvents /home/gusakovy/Projects/valkey/src/ae.c:370
valkey-io#20 0x4643e3 in aeProcessEvents /home/gusakovy/Projects/valkey/src/ae.c:513
valkey-io#21 0x4647ea in aeMain /home/gusakovy/Projects/valkey/src/ae.c:543
valkey-io#22 0x4a61fc in main /home/gusakovy/Projects/valkey/src/server.c:7291
valkey-io#23 0x7f471957c139 in __libc_start_main (/lib64/libc.so.6+0x21139)
SUMMARY: AddressSanitizer: heap-use-after-free /home/gusakovy/Projects/valkey/src/hashtable.c:422 in isPositionFilled
Shadow bytes around the buggy address:
0x0c227fff85d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff85f0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c227fff8600: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8610: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
=>0x0c227fff8620:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8630: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c227fff8640: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8660: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8670: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==46774==ABORTING
```
**Solution**
Suggested solution is to also pause auto shrinking during
`hashtableScanDefrag`. I noticed that there was already a
`hashtablePauseAutoShrink` method and `pause_auto_shrink` counter, but
it wasn't actually used in `hashtableShrinkIfNeeded` so I fixed that.
**Testing**
I created a simple tcl test that (most of the times) triggers this
error, but it's a little clunky so I didn't add it as part of the PR:
```
start_server {tags {"expire hashtable defrag"}} {
test {hashtable scan defrag on expiry} {
r config set hz 100
set num_keys 20
for {set i 0} {$i < $num_keys} {incr i} {
r set "key_$i" "value_$i"
}
for {set j 0} {$j < 50} {incr j} {
set expire_keys 100
for {set i 0} {$i < $expire_keys} {incr i} {
# Short expiry time to ensure they expire quickly
r psetex "expire_key_${i}_${j}" 100 "expire_value_${i}_${j}"
}
# Verify keys are set
set initial_size [r dbsize]
assert_equal $initial_size [expr $num_keys + $expire_keys]
after 150
for {set i 0} {$i < 10} {incr i} {
r get "expire_key_${i}_${j}"
after 10
}
}
set remaining_keys [r dbsize]
assert_equal $remaining_keys $num_keys
# Verify server is still responsive
assert_equal [r ping] {PONG}
} {}
}
```
Compiling with ASAN using `make noopt SANITIZER=address valkey-server`
and running the test causes error above. Applying the fix resolves the
issue.
Signed-off-by: Yakov Gusakov <yaakov0015@gmail.com>1 parent 868721c commit ad2c6b0
1 file changed
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1261 | 1261 | | |
1262 | 1262 | | |
1263 | 1263 | | |
1264 | | - | |
| 1264 | + | |
1265 | 1265 | | |
1266 | 1266 | | |
1267 | 1267 | | |
| |||
1753 | 1753 | | |
1754 | 1754 | | |
1755 | 1755 | | |
| 1756 | + | |
1756 | 1757 | | |
1757 | 1758 | | |
1758 | 1759 | | |
| |||
1857 | 1858 | | |
1858 | 1859 | | |
1859 | 1860 | | |
| 1861 | + | |
1860 | 1862 | | |
1861 | 1863 | | |
1862 | 1864 | | |
| |||
0 commit comments