[Bug Fix] Optimize RDB Load Performance and Fix Cluster Mode Resizing #1199

naglera · 2024-10-21T09:11:31Z

This PR addresses two issues:

Performance Degradation Fix - Resolves a significant performance issue during RDB load on replica nodes.
- The problem was causing replicas to rehash multiple times during the load process. Local testing demonstrated up to 50% degradation in BGSAVE time.
- The problem occurs when the replica tries to expand pre-created slot dictionaries. This operation fails quietly, resulting in undetected performance issues.
- This fix aims to optimize the RDB load process and restore expected performance levels.
Bug fix when reading RDB_OPCODE_RESIZEDB in Valkey 8.0 cluster mode-
- Use the shard's master slots count when processing this opcode, as clusterNodeCoversSlot is not initialized for the currently syncing replica.
- Previously, this problem went unnoticed because RDB_OPCODE_RESIZEDB had no practical impact (due to 1).

These improvements will enhance overall system performance and ensure smoother upgrades to Valkey 8.0 in the future.

Testing:

Conducted local tests to verify the performance improvement during RDB load.
Verified that ignoring RDB_OPCODE_RESIZEDB does not negatively impact functionality in the current version.

This commit is meant to fix performance degradation caused by the replica having to rehash many times during RDB load. Testing locally we were able to reproduce up to 50% degredation in BGSAVE time. Signed-off-by: naglera <[email protected]>

codecov · 2024-10-21T09:31:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.70%. Comparing base (2743b7e) to head (3f70b22).
Report is 170 commits behind head on unstable.

Additional details and impacted files

@@            Coverage Diff            @@
##           unstable    #1199   +/-   ##
=========================================
  Coverage     70.69%   70.70%           
=========================================
  Files           114      115    +1     
  Lines         63076    63159   +83     
=========================================
+ Hits          44594    44657   +63     
- Misses        18482    18502   +20

Files with missing lines	Coverage Δ
src/db.c	`89.08% <ø> (+0.30%)`	⬆️
src/kvstore.c	`96.28% <100.00%> (+<0.01%)`	⬆️

... and 28 files with indirect coverage changes

zuiderkwast

Good findings. Too bad we didn't fix this before 8.0. :)

src/server.h

src/kvstore.h

src/kvstore.c

src/rdb.c

hpatro · 2024-10-21T20:27:22Z

Reviewed the code from github.dev, it posted the comments twice 😂

Co-authored-by: Harkrishn Patro <[email protected]> Signed-off-by: Amit Nagler <[email protected]>

Signed-off-by: naglera <[email protected]>

enjoy-binbin

We have this approximate thing, do you mean it is not working in this case? sorry i did not look at the discusstion deeply, just a quick look.

static int dbExpandGeneric(kvstore *kvs, uint64_t db_size, int try_expand) {
    int ret;
    if (server.cluster_enabled) {
        /* We don't know exact number of keys that would fall into each slot, but we can
         * approximate it, assuming even distribution, divide it by the number of slots. */
        int slots = getMyShardSlotCount();
        if (slots == 0) return C_OK;
        db_size = db_size / slots;
        ret = kvstoreExpand(kvs, db_size, try_expand, dbExpandSkipSlot);
    } else {
        ret = kvstoreExpand(kvs, db_size, try_expand, NULL);
    }

    return ret ? C_OK : C_ERR;
}

naglera · 2024-10-27T07:07:53Z

We have this approximate thing, do you mean it is not working in this case? sorry i did not look at the discusstion deeply, just a quick look.

@enjoy-binbin you're right that there's an approximation in place, but in this specific case, it's not working as intended. Let me explain why:

The issue arises when an 8.0 replica syncs from a 7.0 primary. In a scenario where the primary owns only a few slots, the replica's kvs->num_dicts (in kvstoreExpand) will still be equal to 16384. The problem is that we can't distinguish between owned and unowned slots based on this approximation alone.

This PR creates dictionaries upon kvstore expand when the dictionary is missing. Impleanting only the first part of the PR will reveal this bug, so we might end up allocating space for all slots during expansion, including the unowned ones.

src/rdb.c

Signed-off-by: naglera <[email protected]>

enjoy-binbin · 2024-11-01T02:42:07Z

The issue arises when an 8.0 replica syncs from a 7.0 primary. In a scenario where the primary owns only a few slots, the replica's kvs->num_dicts (in kvstoreExpand) will still be equal to 16384. The problem is that we can't distinguish between owned and unowned slots based on this approximation alone.

This PR creates dictionaries upon kvstore expand when the dictionary is missing. Impleanting only the first part of the PR will reveal this bug, so we might end up allocating space for all slots during expansion, including the unowned ones.

ok, thanks, now i see the point, does this means the approximate thing now is no longer needed and we can remove the dead code?

enjoy-binbin · 2024-11-14T14:43:55Z

i re-read the code.

The issue arises when an 8.0 replica syncs from a 7.0 primary. In a scenario where the primary owns only a few slots, the replica's kvs->num_dicts (in kvstoreExpand) will still be equal to 16384. The problem is that we can't distinguish between owned and unowned slots based on this approximation alone.

The kvs->num_dicts is indeed 16384, but we have a getMyShardSlotCount, so the 8.0-replica know the slot info, and it will use it, see getMyShardSlotCount

static int dbExpandGeneric(kvstore *kvs, uint64_t db_size, int try_expand) {
    int ret;
    if (server.cluster_enabled) {
        /* We don't know exact number of keys that would fall into each slot, but we can
         * approximate it, assuming even distribution, divide it by the number of slots. */
        int slots = getMyShardSlotCount();
        if (slots == 0) return C_OK;
        db_size = db_size / slots;
        ret = kvstoreExpand(kvs, db_size, try_expand, dbExpandSkipSlot);
    } else {
        ret = kvstoreExpand(kvs, db_size, try_expand, NULL);
    }

    return ret ? C_OK : C_ERR;
}

The problem occurs when the replica tries to expand pre-created slot dictionaries. This operation fails quietly, resulting in undetected performance issues.

int kvstoreExpand(kvstore *kvs, uint64_t newsize, int try_expand, kvstoreExpandShouldSkipDictIndex *skip_cb) {
    for (int i = 0; i < kvs->num_dicts; i++) {
        dict *d = kvstoreGetDict(kvs, i);
        if (!d || (skip_cb && skip_cb(i))) continue;
        int result = try_expand ? dictTryExpand(d, newsize) : dictExpand(d, newsize);
        if (try_expand && result == DICT_ERR) return 0;
    }

    return 1;
}

i think this fails becuase two reasons

the dict is NULL, doing createDictIfNeeded can fix it (make sure the size is > 0 before create the dict).
the (skip_cb && skip_cb(i)), in dbExpandSkipSlot, we check the wrond node, we should use the primary. In the old code, the slot-dict got skipped, so we dont expand the dict at all.

/* CB passed to kvstoreExpand.
 * The purpose is to skip expansion of unused dicts in cluster mode (all
 * dicts not mapped to *my* slots) */
static int dbExpandSkipSlot(int slot) {
-    return !clusterNodeCoversSlot(getMyClusterNode(), slot);
+    return !clusterNodeCoversSlot(clusterNodeGetPrimary(getMyClusterNode()), slot);
}

can this slove your problem?

 int kvstoreExpand(kvstore *kvs, uint64_t newsize, int try_expand, kvstoreExpandShouldSkipDictIndex *skip_cb) {
+    if (newsize == 0) return 1;
+
     for (int i = 0; i < kvs->num_dicts; i++) {
-        dict *d = kvstoreGetDict(kvs, i);
-        if (!d || (skip_cb && skip_cb(i))) continue;
+        if (skip_cb && skip_cb(i)) continue;
+
+        dict *d = createDictIfNeeded(kvs, i);
         int result = try_expand ? dictTryExpand(d, newsize) : dictExpand(d, newsize);
         if (try_expand && result == DICT_ERR) return 0;
     }

…d be determained by master node info Signed-off-by: naglera <[email protected]>

zuiderkwast

Great! This new fix is much better!

I think we shall backport it to 8.0, that is include it in 8.0.2.

For the description:

Valkey is spelled with a lowercase "k".
"CME" (AWS terminology) => "cluster mode".

enjoy-binbin

Good, the new code LGTM (i like my code), please also update the top comment and title, i guess it might be a little outdated (not good at wording). And also i see you have measured the efficiency before, can you measure it again?

naglera · 2024-11-18T10:37:25Z

I've conducted additional performance tests, and the results show that the fix from the 8.0 branch yields a significant improvement of over 30%. Here are the detailed results:

Unstable replica (before fix): Total sync time: approximately 19 seconds
After applying the fix: Total sync time: approximately 13 seconds

Test environment details:

Cluster mode: Disabled
Master configuration: 25 million keys, each with a 10-byte value
Replica: Running on a local machine

Redis configuration settings:

rdbcompression no
repl-diskless-sync yes
rdbchecksum no

Both tests loaded 25,310,000 keys from an RDB file of about 1.92 GB.

Logs

Unstable replica (before fix)

3417300:S 18 Nov 2024 10:11:21.165 * PRIMARY <-> REPLICA sync: Loading DB in memory
3417300:S 18 Nov 2024 10:11:21.170 * Loading RDB produced by Valkey version 255.255.255
3417300:S 18 Nov 2024 10:11:21.170 * RDB age 10 seconds
3417300:S 18 Nov 2024 10:11:21.170 * RDB memory usage when created 1918.82 Mb
3417300:S 18 Nov 2024 10:11:39.205 * Done loading RDB, keys loaded: 25310000, keys expired: 0.
3417300:S 18 Nov 2024 10:11:39.205 * PRIMARY <-> REPLICA sync: Finished with success

Replica after applying the fix

3417543:S 18 Nov 2024 10:14:09.649 * PRIMARY <-> REPLICA sync: Loading DB in memory
3417543:S 18 Nov 2024 10:14:09.649 * Loading RDB produced by Valkey version 255.255.255
3417543:S 18 Nov 2024 10:14:09.649 * RDB age 10 seconds
3417543:S 18 Nov 2024 10:14:09.649 * RDB memory usage when created 1918.84 Mb
3417543:S 18 Nov 2024 10:14:21.965 * Done loading RDB, keys loaded: 25310000, keys expired: 0.
3417543:S 18 Nov 2024 10:14:21.966 * PRIMARY <-> REPLICA sync: Finished with success

…a side (valkey-io#1199) This PR addresses two issues: 1. Performance Degradation Fix - Resolves a significant performance issue during RDB load on replica nodes. - The problem was causing replicas to rehash multiple times during the load process. Local testing demonstrated up to 50% degradation in BGSAVE time. - The problem occurs when the replica tries to expand pre-created slot dictionaries. This operation fails quietly, resulting in undetected performance issues. - This fix aims to optimize the RDB load process and restore expected performance levels. 2. Bug fix when reading `RDB_OPCODE_RESIZEDB` in Valkey 8.0 cluster mode- - Use the shard's master slots count when processing this opcode, as `clusterNodeCoversSlot` is not initialized for the currently syncing replica. - Previously, this problem went unnoticed because `RDB_OPCODE_RESIZEDB` had no practical impact (due to 1). These improvements will enhance overall system performance and ensure smoother upgrades to Valkey 8.0 in the future. Testing: - Conducted local tests to verify the performance improvement during RDB load. - Verified that ignoring `RDB_OPCODE_RESIZEDB` does not negatively impact functionality in the current version. Signed-off-by: naglera <[email protected]> Co-authored-by: Binbin <[email protected]>

…a side (valkey-io#1199) This PR addresses two issues: 1. Performance Degradation Fix - Resolves a significant performance issue during RDB load on replica nodes. - The problem was causing replicas to rehash multiple times during the load process. Local testing demonstrated up to 50% degradation in BGSAVE time. - The problem occurs when the replica tries to expand pre-created slot dictionaries. This operation fails quietly, resulting in undetected performance issues. - This fix aims to optimize the RDB load process and restore expected performance levels. 2. Bug fix when reading `RDB_OPCODE_RESIZEDB` in Valkey 8.0 cluster mode- - Use the shard's master slots count when processing this opcode, as `clusterNodeCoversSlot` is not initialized for the currently syncing replica. - Previously, this problem went unnoticed because `RDB_OPCODE_RESIZEDB` had no practical impact (due to 1). These improvements will enhance overall system performance and ensure smoother upgrades to Valkey 8.0 in the future. Testing: - Conducted local tests to verify the performance improvement during RDB load. - Verified that ignoring `RDB_OPCODE_RESIZEDB` does not negatively impact functionality in the current version. Signed-off-by: naglera <[email protected]> Co-authored-by: Binbin <[email protected]> (cherry picked from commit c5012cc) Signed-off-by: naglera <[email protected]>

…a side (valkey-io#1199) This PR addresses two issues: 1. Performance Degradation Fix - Resolves a significant performance issue during RDB load on replica nodes. - The problem was causing replicas to rehash multiple times during the load process. Local testing demonstrated up to 50% degradation in BGSAVE time. - The problem occurs when the replica tries to expand pre-created slot dictionaries. This operation fails quietly, resulting in undetected performance issues. - This fix aims to optimize the RDB load process and restore expected performance levels. 2. Bug fix when reading `RDB_OPCODE_RESIZEDB` in Valkey 8.0 cluster mode- - Use the shard's master slots count when processing this opcode, as `clusterNodeCoversSlot` is not initialized for the currently syncing replica. - Previously, this problem went unnoticed because `RDB_OPCODE_RESIZEDB` had no practical impact (due to 1). These improvements will enhance overall system performance and ensure smoother upgrades to Valkey 8.0 in the future. Testing: - Conducted local tests to verify the performance improvement during RDB load. - Verified that ignoring `RDB_OPCODE_RESIZEDB` does not negatively impact functionality in the current version. Signed-off-by: naglera <[email protected]> Co-authored-by: Binbin <[email protected]>

…a side (#1199) This PR addresses two issues: 1. Performance Degradation Fix - Resolves a significant performance issue during RDB load on replica nodes. - The problem was causing replicas to rehash multiple times during the load process. Local testing demonstrated up to 50% degradation in BGSAVE time. - The problem occurs when the replica tries to expand pre-created slot dictionaries. This operation fails quietly, resulting in undetected performance issues. - This fix aims to optimize the RDB load process and restore expected performance levels. 2. Bug fix when reading `RDB_OPCODE_RESIZEDB` in Valkey 8.0 cluster mode- - Use the shard's master slots count when processing this opcode, as `clusterNodeCoversSlot` is not initialized for the currently syncing replica. - Previously, this problem went unnoticed because `RDB_OPCODE_RESIZEDB` had no practical impact (due to 1). These improvements will enhance overall system performance and ensure smoother upgrades to Valkey 8.0 in the future. Testing: - Conducted local tests to verify the performance improvement during RDB load. - Verified that ignoring `RDB_OPCODE_RESIZEDB` does not negatively impact functionality in the current version. Signed-off-by: naglera <[email protected]> Co-authored-by: Binbin <[email protected]>

… on replica side (#1199) (#1328) This PR addresses two issues: 1. Performance Degradation Fix - Resolves a significant performance issue during RDB load on replica nodes. - The problem was causing replicas to rehash multiple times during the load process. Local testing demonstrated up to 50% degradation in BGSAVE time. - The problem occurs when the replica tries to expand pre-created slot dictionaries. This operation fails quietly, resulting in undetected performance issues. - This fix aims to optimize the RDB load process and restore expected performance levels. 2. Bug fix when reading `RDB_OPCODE_RESIZEDB` in Valkey 8.0 cluster mode- - Use the shard's master slots count when processing this opcode, as `clusterNodeCoversSlot` is not initialized for the currently syncing replica. - Previously, this problem went unnoticed because `RDB_OPCODE_RESIZEDB` had no practical impact (due to 1). These improvements will enhance overall system performance and ensure smoother upgrades to Valkey 8.0 in the future. Testing: - Conducted local tests to verify the performance improvement during RDB load. - Verified that ignoring `RDB_OPCODE_RESIZEDB` does not negatively impact functionality in the current version. (cherry picked from commit c5012cc) Signed-off-by: naglera <[email protected]> Co-authored-by: Binbin <[email protected]>

#2466) If we want to expand kvstoreHashtableExpand, we need to make sure the hashtable exists. Currently, when processing RDB slot-info, our expand has no effect because the hashtable does not exist (we initialize it only when we need it). We also update kvstoreExpand to use the kvstoreHashtableExpand to make sure there is only one code path. Also see #1199 for more details. Signed-off-by: Binbin <[email protected]>

valkey-io#2466) If we want to expand kvstoreHashtableExpand, we need to make sure the hashtable exists. Currently, when processing RDB slot-info, our expand has no effect because the hashtable does not exist (we initialize it only when we need it). We also update kvstoreExpand to use the kvstoreHashtableExpand to make sure there is only one code path. Also see valkey-io#1199 for more details. Signed-off-by: Binbin <[email protected]>

When reading RDB files with information about the number of keys per cluster slot, we need to create the dicts if they don't exist. Currently, when processing RDB slot-info, our expand has no effect because the dict does not exist (we initialize it only when we need it). We also update kvstoreExpand to use the kvstoreDictExpand to make sure there is only one code path. Also see valkey-io#1199 for more details. Signed-off-by: Binbin <[email protected]>

When reading RDB files with information about the number of keys per cluster slot, we need to create the dicts if they don't exist. Currently, when processing RDB slot-info, our expand has no effect because the dict does not exist (we initialize it only when we need it). We also update kvstoreExpand to use the kvstoreDictExpand to make sure there is only one code path. Also see #1199 for more details. Signed-off-by: Binbin <[email protected]>

When reading RDB files with information about the number of keys per cluster slot, we need to create the dicts if they don't exist. Currently, when processing RDB slot-info, our expand has no effect because the dict does not exist (we initialize it only when we need it). We also update kvstoreExpand to use the kvstoreDictExpand to make sure there is only one code path. Also see valkey-io#1199 for more details. Signed-off-by: Binbin <[email protected]>

valkey-io#2466) If we want to expand kvstoreHashtableExpand, we need to make sure the hashtable exists. Currently, when processing RDB slot-info, our expand has no effect because the hashtable does not exist (we initialize it only when we need it). We also update kvstoreExpand to use the kvstoreHashtableExpand to make sure there is only one code path. Also see valkey-io#1199 for more details. Signed-off-by: Binbin <[email protected]>

#2466) If we want to expand kvstoreHashtableExpand, we need to make sure the hashtable exists. Currently, when processing RDB slot-info, our expand has no effect because the hashtable does not exist (we initialize it only when we need it). We also update kvstoreExpand to use the kvstoreHashtableExpand to make sure there is only one code path. Also see #1199 for more details. Signed-off-by: Binbin <[email protected]>

valkey-io#2466) If we want to expand kvstoreHashtableExpand, we need to make sure the hashtable exists. Currently, when processing RDB slot-info, our expand has no effect because the hashtable does not exist (we initialize it only when we need it). We also update kvstoreExpand to use the kvstoreHashtableExpand to make sure there is only one code path. Also see valkey-io#1199 for more details. Signed-off-by: Binbin <[email protected]>

#2466) If we want to expand kvstoreHashtableExpand, we need to make sure the hashtable exists. Currently, when processing RDB slot-info, our expand has no effect because the hashtable does not exist (we initialize it only when we need it). We also update kvstoreExpand to use the kvstoreHashtableExpand to make sure there is only one code path. Also see #1199 for more details. Signed-off-by: Binbin <[email protected]>

valkey-io#2466) If we want to expand kvstoreHashtableExpand, we need to make sure the hashtable exists. Currently, when processing RDB slot-info, our expand has no effect because the hashtable does not exist (we initialize it only when we need it). We also update kvstoreExpand to use the kvstoreHashtableExpand to make sure there is only one code path. Also see valkey-io#1199 for more details. Signed-off-by: Binbin <[email protected]> Signed-off-by: Harkrishn Patro <[email protected]>

naglera force-pushed the rdb-load-fix branch from 0442f8b to 3d51cdd Compare October 21, 2024 09:17

zuiderkwast reviewed Oct 21, 2024

View reviewed changes

src/server.h Outdated Show resolved Hide resolved

src/kvstore.h Outdated Show resolved Hide resolved

hpatro reviewed Oct 21, 2024

View reviewed changes

src/kvstore.c Outdated Show resolved Hide resolved

src/rdb.c Outdated Show resolved Hide resolved

naglera and others added 2 commits October 22, 2024 09:53

Update src/rdb.c

4723a4f

Co-authored-by: Harkrishn Patro <[email protected]> Signed-off-by: Amit Nagler <[email protected]>

dict expand if missing by default

465ded6

Signed-off-by: naglera <[email protected]>

enjoy-binbin reviewed Oct 25, 2024

View reviewed changes

zuiderkwast reviewed Oct 28, 2024

View reviewed changes

src/rdb.c Outdated Show resolved Hide resolved

Fix documentation

a394c0a

Signed-off-by: naglera <[email protected]>

zuiderkwast approved these changes Oct 28, 2024

View reviewed changes

ranshid approved these changes Oct 31, 2024

View reviewed changes

Enable slot size estimation for legacy RDB load. The slot count shoul…

3f70b22

…d be determained by master node info Signed-off-by: naglera <[email protected]>

zuiderkwast approved these changes Nov 17, 2024

View reviewed changes

enjoy-binbin approved these changes Nov 17, 2024

View reviewed changes

naglera changed the title ~~[Bug Fix] On RDB load, after reading db size opcode forcefully expand main dict~~ [Bug Fix] Optimize RDB Load Performance and Fix Cluster Mode Resizing Nov 18, 2024

enjoy-binbin merged commit c5012cc into valkey-io:unstable Nov 18, 2024
48 checks passed

madolson added the release-notes This issue should get a line item in the release notes label Jan 6, 2025

enjoy-binbin mentioned this pull request Aug 11, 2025

Fix pre-size hashtables per slot when reading RDB files #2466

Merged

[Bug Fix] Optimize RDB Load Performance and Fix Cluster Mode Resizing #1199

[Bug Fix] Optimize RDB Load Performance and Fix Cluster Mode Resizing #1199

Uh oh!

Conversation

naglera commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hpatro commented Oct 21, 2024

Uh oh!

enjoy-binbin left a comment

Choose a reason for hiding this comment

Uh oh!

naglera commented Oct 27, 2024

Uh oh!

Uh oh!

enjoy-binbin commented Nov 1, 2024

Uh oh!

enjoy-binbin commented Nov 14, 2024

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

enjoy-binbin left a comment

Choose a reason for hiding this comment

Uh oh!

naglera commented Nov 18, 2024

Logs

Unstable replica (before fix)

Replica after applying the fix

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

naglera commented Oct 21, 2024 •

edited

Loading

codecov bot commented Oct 21, 2024 •

edited

Loading