Allow shrinking hashtables in low memory situations #2095

Fusl · 2025-05-17T07:47:51Z

During memory eviction, we may end up with a primary kv hash table that's below the 13% MIN_FILL_PERCENT_SOFT, but inside resize() for the kv table we check with ht->type->resizeAllowed whether allocating a new table crosses the configured maxmemory limit without taking into account that shrinking, while temporarily increasing the memory usage, will ultimately reduce the memory usage in all situations.

Blocking hash tables from shrinking in situations where we would only temporarily cross the maxmemory limit has at least three negative side effects:

The performance of valkey is negatively impacted if, for example, we're moving items from table A to table B since table A would eventually end up being a very sparse hash table that never shrinks.
In extreme situations, the performance of valkey slows down significantly when trying to find more keys to evict since we're trying to find keys in a sparse table that only becomes more sparse over time.
In more extreme situations, such as when reducing maxmemory by around 90% or writing large data to a single key, valkey can end up in a situation where, no matter how many keys are being evicted from the kv table, the table bucket struct overhead uses more memory than the data stores in the table, causing all keys to be evicted from the table and valkey becoming unusable before setting maxmemory to above the current memory usage:

127.0.0.1:6379> config set maxmemory 2gb
OK
127.0.0.1:6379> debug populate 20000000
OK
(8.94s)
127.0.0.1:6379> debug htstats 0
[Dictionary HT]
Hash table 0 stats (main hash table):
 table size: 29360128
 number of entries: 20000000
[Expires HT]
127.0.0.1:6379> config set maxmemory 200mb
OK
127.0.0.1:6379> dbsize
(integer) 18725794
[...]
127.0.0.1:6379> dbsize
(integer) 0
127.0.0.1:6379> keys *
(empty array)
127.0.0.1:6379> set foo bar
(error) OOM command not allowed when used memory > 'maxmemory'.
127.0.0.1:6379> config get maxmemory
1) "maxmemory"
2) "209715200"
127.0.0.1:6379> info memory
# Memory
used_memory:269890904
used_memory_human:257.39M
[...]
127.0.0.1:6379> config set maxmemory 300mb
OK
127.0.0.1:6379> info memory
# Memory
used_memory:1455512
used_memory_human:1.39M
[...]
127.0.0.1:6379> set foo bar
OK

This PR addresses these issues by allowing hash tables to allocate a new table for shrinking, even if allocating a new table causes the maxmemory limit to be crossed during resize.

Additionally, the check for whether we should fast-forward the resize operation has been moved to before checking whether we are allowed to allocate a new resize table. Because we don't start a resize in this section of the function yet, a fast-forward usually drops the memory usage enough to allow another resize operation to start immediately after the fast-forward resize completes.

As a side effect of this PR, we now also refuse to start a hashtable resize if the policy is set to HASHTABLE_RESIZE_FORBID.

codecov · 2025-05-17T08:03:05Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.39%. Comparing base (e53e048) to head (c16727d).
Report is 6 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2095      +/-   ##
============================================
- Coverage     71.43%   71.39%   -0.04%     
============================================
  Files           123      123              
  Lines         67030    67067      +37     
============================================
+ Hits          47881    47882       +1     
- Misses        19149    19185      +36

Files with missing lines	Coverage Δ
src/debug.c	`53.62% <100.00%> (+0.04%)`	⬆️
src/hashtable.c	`82.42% <100.00%> (+0.11%)`	⬆️
src/kvstore.c	`95.02% <100.00%> (-0.02%)`	⬇️
src/server.c	`88.06% <100.00%> (-0.04%)`	⬇️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

zuiderkwast

I don't think we should allow a large allocation to go above maxmemory in this case. It's risky, as mentioned in the comment below. Especially when eviction is disabled.

When the maxmemory config is changed, it's a special case maybe. Here we can maybe do something smart, such as remember how much memory we were just using and reuse some of the memory we just freed for evicting keys and use it for shrinking the hashtable...?

For the case when maxmemory is not changed, this can happen only in some unusual(?) cases like you mentioned: You delete most top-level keys while at the same time filling up single key (a hash, set, etc.).

We have had some discussions about hard and soft maxmemory limits, a separate maxmemory for keys and other ideas. See for example:

#831

zuiderkwast · 2025-05-18T20:56:58Z

src/hashtable.c


+    if (ht->type->resizeAllowed) {
+        double fill_factor = (double)min_capacity / ((double)numBuckets(old_exp) * ENTRIES_PER_BUCKET);
+        if (fill_factor * 100 < MAX_FILL_PERCENT_HARD && !ht->type->resizeAllowed(exp > old_exp ? alloc_size : 0, fill_factor)) {


This effectively ignores the allocation size when shrinking. It means we allow memory usage to temporarily grow above maxmemory. This is risky in a system where maxmemory is set close to the available memory. In the worst case, the kernel's OOM killer will kill Valkey.

The maxmemory limit is and was never a guarantee that the memory usage of Valkey stays within those bounds, and is also explained in this comment inside evict.c:

valkey/src/evict.c

Lines 498 to 502 in 4092c9c

* It's possible for the server to suddenly be significantly over the "maxmemory"

* setting. This can happen if there is a large allocation (like a hash table

* resize) or even if the "maxmemory" setting is manually adjusted. Because of

* this, it's important to evict for a managed period of time - otherwise the server

* would become unresponsive while evicting.

OK. This is internal comments though, not official documentation. But assuming this is the officially documented behavior, then I guess we can do this change.

After talking to @madolson about this PR, I think we can allow it.

Minor details: Just as we ignore and don't call the callback if we're over MAX_FILL_PERCENT_HARD, we can also ignore it if we're shrinking. I mean this:

Suggested change

if (fill_factor * 100 < MAX_FILL_PERCENT_HARD && !ht->type->resizeAllowed(exp > old_exp ? alloc_size : 0, fill_factor)) {

if (fill_factor * 100 < MAX_FILL_PERCENT_HARD &&

exp > old_exp &&

!ht->type->resizeAllowed(alloc_size, fill_factor)) {

Then we also don't need the change in evict.c

I did it this way since hashtableResizeAllowed has an additional check if (!server.dict_resizing) return 0; (/* For debug purposes, not allowed to be resized. */) that I don't want skip.

I moved the check up to the if (ht->type->resizeAllowed && exp > old_exp) { condition, this test is failing now:

[err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl scan didn't handle slot skipping logic. [1/1 done]: unit/expire (17 seconds) The End Execution time of different units: 17 seconds - unit/expire !!! WARNING The following tests failed: *** [err]: expire scan should skip dictionaries with lot's of empty buckets in tests/unit/expire.tcl scan didn't handle slot skipping logic.

This test relies on the dict_resizing debug flag and in order to fix the test I'd have to move the check for if (!server.dict_resizing) return 0; into the resize function. That effectively makes the resizeAllowed callback function useless. Do we want to move the logic into the resize function and get rid of the resizeAllowed callback function?

There are several (too many) mechanisms to disable resizing. Another is to call hashtableSetResizePolicy(HASHTABLE_RESIZE_FORBID). This is done in updateDictResizePolicy() in server.c (which is called e.g. after fork()).

I suggest that we change it like this:

void updateDictResizePolicy(void) { - if (server.in_fork_child != CHILD_TYPE_NONE) { + if (server.in_fork_child != CHILD_TYPE_NONE || !server.dict_resizing) { dictSetResizeEnabled(DICT_RESIZE_FORBID); hashtableSetResizePolicy(HASHTABLE_RESIZE_FORBID); } else if (hasActiveChildProcess()) { dictSetResizeEnabled(DICT_RESIZE_AVOID); hashtableSetResizePolicy(HASHTABLE_RESIZE_AVOID); } else { dictSetResizeEnabled(DICT_RESIZE_ENABLE); hashtableSetResizePolicy(HASHTABLE_RESIZE_ALLOW); } }

and in debug.c let's call this after handling DEBUG dict-resizing (0|1).

} else if (!strcasecmp(c->argv[1]->ptr, "dict-resizing") && c->argc == 3) { server.dict_resizing = atoi(c->argv[2]->ptr); + updateDictResizePolicy(); addReply(c, shared.ok);

With this change, we can delete the check for server.dict_resizing in hashtableResizeAllowed().

I think this will be a cleaner implementation. WDYT?

That would work but I just realized that we may not be able to get rid of the hashtableResizeAllowed callback anyway since I think we're using hashtable in non-server stuff as well and calling overMaxmemoryAfterAlloc() to check for the maxmemory constraint would mean having to pull in evict.c and server.h.
i do think that the way I initially implemented and PR'd the change was already the most reliable one we could do without introducing a bunch of scary side-effects and unknown gotchas.

we may not be able to get rid of the hashtableResizeAllowed callback anyway

That's right. I didn't propose to get rid of hashtableResizeAllowed entirely. Only to skip calling it with 0 bytes when we're shrinking is odd and since we're not calling the resizeAllowed callback at all when we're over MAX_FILL_PERCENT_HARD, skipping the callback in the same way when shrinking seems more consistent with that.

and calling overMaxmemoryAfterAlloc() to check for the maxmemory constraint would mean having to pull in evict.c and server.h.

No, why would we? The suggested changes in server.c and evict.c only affect server, not other usages such as valkey-benchmark, cli, etc.

i do think that the way I initially implemented and PR'd the change was already the most reliable one we could do without introducing a bunch of scary side-effects and unknown gotchas.

Are you talking about usages of hashtable outside of this repo?

Updated the PR with your proposed changes and included all the implementation changes to make it work. Please let me know if this looks good or if you'd like something changed up. Note that a lot of the changes you see here were necessary to properly merge the dict-resizing flag into the resize_policy enum and to satisfy the expire scan should skip dictionaries with lot's of empty buckets test.

Signed-off-by: Fusl <[email protected]>

… flag check Signed-off-by: Fusl <[email protected]>

zuiderkwast

Yeah, this looks good to me. It's not that many changes.

An additional change worth mentioning is that FORBID is enforced now when expanding. It wasn't before. (The reason for this was that an earlier version of this hashtable was using open addressing which has an absolute limit: It could never be more than 100% full.)

There is a leak reported in the address-sanitizer job. Maybe it's fixed by #2288 that I merged yesterday. Can you merge the latest unstable and see if the leak is gone?

src/hashtable.c

src/kvstore.c

Fusl · 2025-07-03T10:04:37Z

There is a leak reported in the address-sanitizer job. Maybe it's fixed by #2288 that I merged yesterday. Can you merge the latest unstable and see if the leak is gone?

My latest force push where I rebased on the unstable branch already includes that commit, not sure what's going on. I'll take a closer look at it.

zuiderkwast · 2025-07-03T12:30:26Z

The address-santitizer job https://github.com/valkey-io/valkey/actions/runs/16039519166/job/45258225409?pr=2095#step:7:437 complains about memory allocated by hashtableCreate. This is after a failed assertion here: https://github.com/valkey-io/valkey/actions/runs/16039519166/job/45258225409?pr=2095#step:7:109

When an asserion fails inside a test case, it is aborted, so we don't free the hash table. I think the leak is not a problem. When we fix the failed assert, this leak should go away.

The unit test framework complains about a leak after each test suite because it only checks that all memory allocated by zmalloc so far has been freed, so these are false positives. It'd be better if it would check it per test suite.

zuiderkwast · 2025-07-03T12:53:10Z

I hope this PR can prevent a memory leak from being reporting multiple times:

Make unit test report each memory leak only once #2304

Signed-off-by: Fusl <[email protected]>

Fusl · 2025-07-03T17:00:29Z

This is after a failed assertion here: https://github.com/valkey-io/valkey/actions/runs/16039519166/job/45258225409?pr=2095#step:7:109

Last commit should fix that test case as well now.

Signed-off-by: Fusl <[email protected]>

src/unit/test_hashtable.c

Signed-off-by: Fusl <[email protected]>

tests/unit/expire.tcl

Wait for previous rehashing to complete before triggering shrinking. Otherwise, it will not shrink. Signed-off-by: Viktor Söderqvist <[email protected]>

zuiderkwast reviewed May 23, 2025

View reviewed changes

Fusl added 2 commits July 2, 2025 20:48

allow shrinking hashtables in low memory situations

3660c13

Signed-off-by: Fusl <[email protected]>

skip resizeAllowed check if exp <= old_exp; move server.dict_resizing…

978410a

… flag check Signed-off-by: Fusl <[email protected]>

Fusl force-pushed the fusl-ht-shrink-during-low-memory branch from ef778da to 978410a Compare July 3, 2025 01:34

zuiderkwast approved these changes Jul 3, 2025

View reviewed changes

src/hashtable.c Outdated Show resolved Hide resolved

src/hashtable.c Show resolved Hide resolved

src/kvstore.c Show resolved Hide resolved

forbid hashtable resizing during test_random_entry_with_long_chain

e08158b

Signed-off-by: Fusl <[email protected]>

update resizeAllowed call comment

40dbdf1

Signed-off-by: Fusl <[email protected]>

zuiderkwast reviewed Jul 3, 2025

View reviewed changes

src/unit/test_hashtable.c Outdated Show resolved Hide resolved

Fusl added 3 commits July 3, 2025 17:19

remove recursive hashtableRightsizeIfNeeded call

8e6855f

Signed-off-by: Fusl <[email protected]>

undo test_hashtable.c changes

1bbadf0

Signed-off-by: Fusl <[email protected]>

formatting

0fbda5d

Signed-off-by: Fusl <[email protected]>

zuiderkwast added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jul 3, 2025

Fusl added 2 commits July 3, 2025 18:00

don't use iter variable in expire test case final check

e8b6806

Signed-off-by: Fusl <[email protected]>

update comment to clarify rehashing trigger in expore.tcl

51e016a

Signed-off-by: Fusl <[email protected]>

zuiderkwast reviewed Jul 4, 2025

View reviewed changes

tests/unit/expire.tcl Outdated Show resolved Hide resolved

Update tests/unit/expire.tcl

c16727d

Wait for previous rehashing to complete before triggering shrinking. Otherwise, it will not shrink. Signed-off-by: Viktor Söderqvist <[email protected]>

zuiderkwast added the release-notes This issue should get a line item in the release notes label Jul 4, 2025

zuiderkwast changed the title ~~allow shrinking hashtables in low memory situations~~ Allow shrinking hashtables in low memory situations Jul 4, 2025

zuiderkwast merged commit 130ea39 into valkey-io:unstable Jul 4, 2025
59 of 61 checks passed

	* It's possible for the server to suddenly be significantly over the "maxmemory"
	* setting. This can happen if there is a large allocation (like a hash table
	* resize) or even if the "maxmemory" setting is manually adjusted. Because of
	* this, it's important to evict for a managed period of time - otherwise the server
	* would become unresponsive while evicting.

Allow shrinking hashtables in low memory situations #2095

Allow shrinking hashtables in low memory situations #2095

Conversation

Fusl commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zuiderkwast Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zuiderkwast Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zuiderkwast left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fusl commented Jul 3, 2025

Uh oh!

zuiderkwast commented Jul 3, 2025

Uh oh!

zuiderkwast commented Jul 3, 2025

Uh oh!

Fusl commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fusl commented May 17, 2025 •

edited

Loading

codecov bot commented May 17, 2025 •

edited

Loading

zuiderkwast Jun 25, 2025 •

edited

Loading

zuiderkwast Jun 27, 2025 •

edited

Loading

zuiderkwast left a comment •

edited

Loading