Skip to content

Conversation

@yzc-yzc
Copy link
Contributor

@yzc-yzc yzc-yzc commented Jun 30, 2025

Fixes #2271

When we shrink a hash table and it is empty, we do it without iterating over it to rehash the entries. However, there may still be empty child buckets (used[0]==0 && child_buckets[0]!=0). These were leaked in this case.

This fix is to check for child buckets and don't skip the incremental rehashing if any child buckets exist. The incremental rehashing pass will free them.

An additional fix is to compact bucket chains in scan when the scan callback has deleted some entries. This was already implemented for the case when rehashing is ongoing but it was missing in the case rehashing is not ongoing.

Additionally, a test case for #2257 was added.

@enjoy-binbin enjoy-binbin requested a review from zuiderkwast July 1, 2025 02:05
Copy link
Member

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we do the same condition in the dict.c, so it should be fine. I did not dive into the details

@codecov
Copy link

codecov bot commented Jul 1, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.49%. Comparing base (d37dc52) to head (802c76c).
Report is 8 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2288      +/-   ##
============================================
+ Coverage     71.46%   71.49%   +0.02%     
============================================
  Files           123      123              
  Lines         66927    66941      +14     
============================================
+ Hits          47831    47857      +26     
+ Misses        19096    19084      -12     
Files with missing lines Coverage Δ
src/hashtable.c 82.31% <100.00%> (+0.92%) ⬆️

... and 18 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great finding!

I have a minor suggestion. If I'm wrong, we can merge it as is.

@zuiderkwast
Copy link
Contributor

Should we add the test case from #2257 ?

I tested it locally and noticed it is slow. It takes 15 seconds. I have modified it to remove the sleeps and some loops. Now it takes 160ms. You can add it if you want:

start_server {tags {expire} overrides {hz 100}} {
    test {Active expiration triggers hashtable shrink} {
        set persistent_keys 5
        set volatile_keys 100
        set total_keys [expr $persistent_keys + $volatile_keys]

        for {set i 0} {$i < $persistent_keys} {incr i} {
            r set "key_$i" "value_$i"
        }
        for {set i 0} {$i < $volatile_keys} {incr i} {
            r psetex "expire_key_${i}" 100 "expire_value_${i}"
        }
        set table_size_before_expire [main_hash_table_size]

        # Verify keys are set
        assert_equal $total_keys [r dbsize]

        # Wait for active expiration
        wait_for_condition 100 50 {
            [r dbsize] eq $persistent_keys
        } else {
            fail "Keys not expired"
        }

        # Wait for the table to shrink and active rehashing finish
        wait_for_condition 100 50 {
            [main_hash_table_size] < $table_size_before_expire
        } else {
            puts [r debug htstats 9]
            fail "Table didn't shrink"
        }

        # Verify server is still responsive
        assert_equal [r ping] {PONG}
    }
}

Co-authored-by: Viktor Söderqvist <[email protected]>
Signed-off-by: yzc-yzc <[email protected]>
@yzc-yzc
Copy link
Contributor Author

yzc-yzc commented Jul 1, 2025

  1. According to the code comments shown below, pause_auto_shrink should not be used for safety issues (maybe just for performance), right?
/* Pauses automatic shrinking. This can be called before deleting a lot of
 * entries, to prevent automatic shrinking from being triggered multiple times.
 * Call hashtableResumeAutoShrink afterwards to restore automatic shrinking. */
void hashtablePauseAutoShrink(hashtable *ht) {
    ht->pause_auto_shrink++;
}
  1. During the scan function, we pause rehash, but why can we resize the hash table? Is it expected that pause_rehash doesn't work for resize?

@zuiderkwast The above two questions are bothering me, could you help me out? Thanks!

@zuiderkwast
Copy link
Contributor

  1. According to the code comments shown below, pause_auto_shrink should not be used for safety issues (maybe just for performance), right?

@yzc-yzc That's right. It is used internally in hashtable.c for safety reasons but in the public API it should not be required for safety.

  1. During the scan function, we pause rehash, but why can we resize the hash table? Is it expected that pause_rehash doesn't work for resize?

For safety of scan, I think it's OK to resize but not rehash during the scan. For example if the scan callback deletes entries. Resize only allocates a new table but no entries are moved, so it doesn't affect the scan algorithm.

However, if new entries are inserted while rehashing is paused (for example the scan callback inserts new entries, or in another situation where the rehashing is paused) they are inserted in the new table. If a lot of new entries are inserted, more than what can fit in the old table, I think it's good that resize is enabled even if rehashing is paused.

Makes sense?

@gusakovy
Copy link
Contributor

gusakovy commented Jul 1, 2025

I also went through a similar debugging process that @yzc-yzc described, and I want to suggest an alternative way to handle this edge case which might be more efficient since you won't have to do any extra rehash steps.

The chained buckets are not released since hashtablePop does not compact the bucket chains when rehashing is paused:

int hashtablePop(hashtable *ht, const void *key, void **popped) {
  ...
        if (b->chained && !hashtableIsRehashingPaused(ht)) {
            /* Rehashing is paused while iterating and when a scan callback is
             * running. In those cases, we do the compaction in the scan and
             * iterator code instead. */
            fillBucketHole(ht, b, pos_in_bucket, table_index);
        }
   ...
}

According to the comment, in that case the compaction should be handles by the scan or iterator. For some reason we only call compactBucketChain in hashtableScanDefrag when the hashtable is rehashing, and I think we should add when we're not rehashing as well:

if (!hashtableIsRehashing(ht)) {
        /* Emit entries at the cursor index. */
-        bucket *b = &ht->tables[0][cursor & mask];
+        size_t idx = cursor & mask;
+        size_t used_before = ht->used[0];
+        bucket *b = &ht->tables[0][idx];
        do {
            if (b->presence != 0) {
                int pos;
                for (pos = 0; pos < ENTRIES_PER_BUCKET; pos++) {
                    if (isPositionFilled(b, pos)) {
                        void *emit = emit_ref ? &b->entries[pos] : b->entries[pos];
                        fn(privdata, emit);
                    }
                }
            }
            bucket *next = getChildBucket(b);
            if (next != NULL && defragfn != NULL) {
                next = bucketDefrag(b, next, defragfn);
            }
            b = next;
        } while (b != NULL);
+        /* If any entries were deleted, fill the holes. */
+        if (ht->used[0] < used_before) {
+            compactBucketChain(ht, idx, 0);
+        }

        /* Advance cursor. */
        cursor = nextCursor(cursor, mask);
    } else {
...

@zuiderkwast
Copy link
Contributor

According to the comment, in that case the compaction should be handles by the scan or iterator. For some reason we only call compactBucketChain in hashtableScanDefrag when the hashtable is rehashing, and I think we should add when we're not rehashing as well

@gusakovy Why is this useful? If rehashing is not paused, the compaction happens immediately when an entry is deleted. Am I missing something?

There is another scenario that can lead to holes and empty buckets: If the scan callback deletes some other entry which is not in the same bucket that was just scanned. Also if rehashing was paused and entries were deleted (without scan or iterator) we can get empty buckets. Therefore, I think @yzc-yzc's fix is still needed.

@gusakovy
Copy link
Contributor

gusakovy commented Jul 1, 2025

Why is this useful? If rehashing is not paused, the compaction happens immediately when an entry is deleted. Am I missing something?

During scan we pause rehashing so hashtableIsRehashingPaused(ht) is always true and hashtablePop will never compact.

In hashtableScanDefrag we condition on hashtableIsRehashing(ht), i.e whether the hashtable is in the middle of rehashing or not and for some reason currently call compactBucketChain only in the case when the hashtable is in the middle of rehashing:

There is another scenario that can lead to holes and empty buckets: If the scan callback deletes some other entry which is not in the same bucket that was just scanned.

If that is possible then yes definitely the proposed fix is still needed.

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Jul 1, 2025

@gusakovy Gotcha. Rehashing paused when rehashing is not ongoing still means that compaction doesn't happen automatically.

Your fix is good. It's not required for fixing the leak but it's needed to clean up empty buckets during scan. For example if many entries are expired, it can cause a lot of empty buckets.

@yzc-yzc Can you include @gusakovy's patch in #2288 (comment) above?

@yzc-yzc
Copy link
Contributor Author

yzc-yzc commented Jul 1, 2025

Makes sense?

got it, thanks!

@yzc-yzc Can you include @gusakovy's patch in #2288 (comment) above?

sure

@yzc-yzc
Copy link
Contributor Author

yzc-yzc commented Jul 1, 2025

Should we add the test case from #2257 ?

I tested it locally and noticed it is slow. It takes 15 seconds. I have modified it to remove the sleeps and some loops. Now it takes 160ms. You can add it if you want:

start_server {tags {expire} overrides {hz 100}} {
    test {Active expiration triggers hashtable shrink} {
        set persistent_keys 5
        set volatile_keys 100
        set total_keys [expr $persistent_keys + $volatile_keys]

        for {set i 0} {$i < $persistent_keys} {incr i} {
            r set "key_$i" "value_$i"
        }
        for {set i 0} {$i < $volatile_keys} {incr i} {
            r psetex "expire_key_${i}" 100 "expire_value_${i}"
        }
        set table_size_before_expire [main_hash_table_size]

        # Verify keys are set
        assert_equal $total_keys [r dbsize]

        # Wait for active expiration
        wait_for_condition 100 50 {
            [r dbsize] eq $persistent_keys
        } else {
            fail "Keys not expired"
        }

        # Wait for the table to shrink and active rehashing finish
        wait_for_condition 100 50 {
            [main_hash_table_size] < $table_size_before_expire
        } else {
            puts [r debug htstats 9]
            fail "Table didn't shrink"
        }

        # Verify server is still responsive
        assert_equal [r ping] {PONG}
    }
}

I ran this test on my PC and found that the probability of triggering leak is very low.(Triggered once after 257 times)
My build command is make noopt SANITIZER=address valkey-server. The code is the last commit before this pr. Am I missing something?

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Jul 1, 2025

I ran this test on my PC and found that the probability of triggering leak is very low.(Triggered once after 257 times)
My build command is make noopt SANITIZER=address valkey-server. The code is the last commit before this pr. Am I missing something?

This is not for testing the leak. The main point is for testing that Active expiration triggers hashtable shrink. It's what was actually implemented in #2257.

The original test case was too slow IMO. I think we don't really need a special test case to trigger the leak. If you think we need one, we should try to write a unit test in src/unit/test_hashtable.c which can run faster.

yzc-yzc and others added 3 commits July 2, 2025 00:02
…nction.

Written by Yakov Gusakov.

Co-authored-by: Yakov Gusakov <[email protected]>
Signed-off-by: yzc-yzc <[email protected]>
Written by Viktor Söderqvist

Co-authored-by: Viktor Söderqvist <[email protected]>
Signed-off-by: yzc-yzc <[email protected]>
@zuiderkwast zuiderkwast changed the title Fix hashtable resize function to handle edge case Fix leak when shrinking a hashtable without entries Jul 2, 2025
@zuiderkwast zuiderkwast added the release-notes This issue should get a line item in the release notes label Jul 2, 2025
@zuiderkwast zuiderkwast merged commit e53e048 into valkey-io:unstable Jul 2, 2025
51 of 52 checks passed
@github-project-automation github-project-automation bot moved this to To be backported in Valkey 8.1 Jul 2, 2025
@ranshid ranshid moved this from To be backported to In Progress in Valkey 8.1 Sep 30, 2025
ranshid pushed a commit to ranshid/valkey that referenced this pull request Sep 30, 2025
Fixes valkey-io#2271

When we shrink a hash table and it is empty, we do it without iterating
over it to rehash the entries. However, there may still be empty child
buckets (`used[0]==0 && child_buckets[0]!=0`). These were leaked in this
case.

This fix is to check for child buckets and don't skip the incremental
rehashing if any child buckets exist. The incremental rehashing pass
will free them.

An additional fix is to compact bucket chains in scan when the scan
callback has deleted some entries. This was already implemented for the
case when rehashing is ongoing but it was missing in the case rehashing
is not ongoing.

Additionally, a test case for valkey-io#2257 was added.

---------

Signed-off-by: yzc-yzc <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
Co-authored-by: Yakov Gusakov <[email protected]>
@ranshid ranshid moved this from In Progress to 8.1.4 in Valkey 8.1 Sep 30, 2025
@ranshid ranshid moved this from 8.1.4 to To be backported in Valkey 8.1 Sep 30, 2025
zuiderkwast added a commit that referenced this pull request Oct 1, 2025
Fixes #2271

When we shrink a hash table and it is empty, we do it without iterating
over it to rehash the entries. However, there may still be empty child
buckets (`used[0]==0 && child_buckets[0]!=0`). These were leaked in this
case.

This fix is to check for child buckets and don't skip the incremental
rehashing if any child buckets exist. The incremental rehashing pass
will free them.

An additional fix is to compact bucket chains in scan when the scan
callback has deleted some entries. This was already implemented for the
case when rehashing is ongoing but it was missing in the case rehashing
is not ongoing.

Additionally, a test case for #2257 was added.

---------

Signed-off-by: yzc-yzc <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
Co-authored-by: Yakov Gusakov <[email protected]>
@zuiderkwast zuiderkwast moved this from To be backported to 8.1.4 in Valkey 8.1 Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes

Projects

Status: 8.1.4

Development

Successfully merging this pull request may close these issues.

Leak when shrinking hash table

4 participants