Save to Disk in Bio Thread and refactor readSyncBulkPayload #1784

nitaicaro · 2025-02-26T10:33:10Z

Introduction

This PR introduces a new feature that enables replicas to perform disk-based synchronization on a dedicated background thread (Bio thread). Benchmarking results demonstrate significant improvements in synchronization duration. In extreme cases, this optimization allows syncs that would have previously failed to succeed.

Problem Statement

Some administrators prefer the disk-based full synchronization mode for replicas. This mode allows replicas to continue serving clients with data while downloading the RDB file.

Valkey's predominantly single-threaded nature creates a challenge: serving client read requests and saving data from the socket to disk are not truly concurrent operations. In practice, the replica alternates between processing client requests and replication data, leading to inefficient behavior and prolonged sync durations, especially under high load.

Proposed Solution

To address this, the solution offloads the task of downloading the RDB file from the socket to a background thread. This allows the main thread to focus exclusively on handling client read requests while the background thread handles communication with the primary.

Benchmarking Results

Potential for Improvement

In theory, this optimization can lead to unbounded improvement in sync duration. By eliminating competition between client read events and socket communication (i.e., events related to handling RDB download with the primary), sync times become independent on load - the main thread handles only client reads, while the background thread focuses on primary RDB download events, allowing the system to perform consistently even under high load.

The full valkey-benchmark commands can be found in the appendix below.

Sync Duration with Feature Disabled (times in seconds)

16 threads, 64 clients: 172 seconds
32 threads, 128 clients: 436 seconds
48 threads, 192 clients: 710 seconds

Sync Duration with Feature Enabled (times in seconds)

16 threads, 64 clients: 33 seconds (80.8% improvement)
32 threads, 128 clients: 33 seconds (92.4% improvement)
48 threads, 192 clients: 33 seconds (95.3% improvement)

Alternative Solutions Considered

IO Threads
IO threads to not have an advantage over Bio in this case: The save-to-disk job is rare (most likely no more than several executions in a replica's lifetime), and there is never more than one simultaneous execution. Bio threads make more sense for a single, slow long running operation.

io_uring
For a single connection, io_uring doesn't provide as much of a performance boost because the primary advantage comes from batching many I/O operations together to reduce syscall overhead. With just one connection, we won't have enough operations to benefit significantly from these optimizations.

Prioritizing primary's socket in the event loop
This approach would help, but less effectively than using a Bio thread. We would still need to allocate attention to handling read requests, which could limit its benefit. It could be more useful on smaller instance types with limited CPU cores. Edit: In practice, this feature actually does this naturally when there is a single core - when both threads are running on the same core, the Bio thread can get up to 50% of CPU time now to handle RDB download, whereas before RDB download events were queued after a bunch of read events (and didn't get even close to 50%).

Code Design

This PR introduces both the new BIO-based RDB sync flow described above and a significant (and much-needed) refactoring of the readSyncBulkPayload function. Previously, this single monolithic function handled the entire replication flow, covering both disk-based and socket-based syncs, and was responsible for everything from sync preparation and RDB reception to loading and finalization. The refactoring splits these responsibilities into clearer, more modular components, improving readability, reusability, and maintainability.

Disk-Based RDB Sync (Bio thread)

The old disk-based RDB save logic has been removed from the main thread. It is now exclusively handled by a dedicated Bio thread, following the team’s decision to remove the need for a config.

New Flow Overview:

After the replication handshake completes in syncWithPrimary() (or at the end of the dual-channel handshake), the replica determines whether to use disk-based sync via useDisklessLoad() (unchanged).

If disk sync is chosen, a read handler is set on the primary’s connection:

connSetReadHandler(conn, receiveRDBinBioThread);
When the primary begins sending the RDB payload, this read handler is triggered. It creates a dedicated Bio thread to perform the sync.

The Bio thread executes replicaReceiveRDBFromPrimaryToDisk, which is now the main handler for receiving and saving RDB to disk.
This function replaces the disk logic previously found in readSyncBulkPayload, but executes in a busy loop (as opposed to being event-driven like readSyncBulkPayload) since the Bio thread is single-purpose and not event-based.

The logic inside replicaReceiveRDBFromPrimaryToDisk is a direct port of the previous flow, with some refactoring for clarity and reusability across sync modes.

Upon completion or failure, the Bio thread signals the main thread using a shared variable (replica_bio_disk_save_state).

The main thread detects Bio completion in a new function called handleBioThreadFinishedRDBDownload() (triggered via replication cron). This function then initiates the RDB load and resets relevant stats.

Note: The actual RDB loading remains on the main thread, as it already halts I/O during load and doesn’t benefit from thread offloading, simplifying thread-safety concerns.

Diskless Sync (Socket)

For socket-based sync, the logic remains the same: asynchronous and event-driven, but has been refactored for modularity and readability.

The original implementation in readSyncBulkPayload mixed socket and disk sync logic with heavy branching.

Now, memory-based sync is isolated in a new function:

replicaReceiveRDBFromPrimaryToMemory
This function retains the same execution pattern as readSyncBulkPayload (i.e., reacting to available data on the socket), but is now cleaner and easier to follow.

Common logic that was previously duplicated across both sync modes is now factored out into shared helpers:

replicaBeforeLoadPrimaryRDB()
replicaAfterLoadPrimaryRDB()

The result is a more maintainable and modular flow. Reviewers are encouraged to compare this function side-by-side with the previous readSyncBulkPayload in order to understand the change easily.

Metrics

Previously, replication metrics such as repl_transfer_size and repl_transfer_read were tracked on the main thread inside readSyncBulkPayload. With the move to Bio threads, those variables could not be safely reused due to thread safety concerns.

To address this:

New Bio-specific metrics are introduced:

server.bio_repl_transfer_size
server.bio_repl_transfer_read

These are updated by the Bio thread during disk-based syncs.

At the end of the sync, the values are merged back into the standard metrics (server.repl_transfer_*) for consistency.

If INFO is queried during an ongoing Bio-based sync, the reported values reflect either the Bio-specific metrics or a combined view when needed, ensuring accurate observability.

Appendix:

Benchmarking Setup

Client machine: AWS c5a.16xlarge
Server machines: AWS c5a.2xlarge

# Step 1: Fill the primary and replica DBs with 6GB of data:

./valkey-benchmark -h <host> -p <port> -l -d 128 -t set -r 30000000 --threads 16 -c 64

# Step 2: Initiate heavy read load on the replica:

./valkey-benchmark -h <host> -p <port> -t get -r 30000000 --threads <t> -c <t> -n 1000000000 -P <P>

# Step 3: Enable/disable the config controlling the new feature:

./valkey-cli -h <host> -p <port> config set replica-save-to-disk-in-bio-thread <yes/no>

# Step 4: Initiate sync:

./valkey-cli -h <replica host> -p <replica port> replicaof <primary host> <primary port>

src/bio.c

codecov · 2025-02-26T10:48:20Z

Codecov Report

❌ Patch coverage is 88.57143% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.52%. Comparing base (8df8c84) to head (ca663c1).
⚠️ Report is 47 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/replication.c	87.76%	40 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1784      +/-   ##
============================================
+ Coverage     71.49%   71.52%   +0.03%     
============================================
  Files           123      123              
  Lines         67179    67348     +169     
============================================
+ Hits          48028    48174     +146     
- Misses        19151    19174      +23

Files with missing lines	Coverage Δ
src/bio.c	`84.93% <100.00%> (+0.48%)`	⬆️
src/server.c	`88.09% <100.00%> (+0.02%)`	⬆️
src/server.h	`100.00% <ø> (ø)`
src/replication.c	`87.06% <87.76%> (+0.33%)`	⬆️

... and 11 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

xbasel

Initial comments.

src/bio.c

src/replication.c

src/bio.c

ranshid · 2025-05-05T13:45:32Z

adding the major-decision-pending even though there is not a breaking change but I think another thread work should be carefully considered

madolson

Major decision is approved. No config.

madolson · 2025-05-19T16:34:05Z

@Fusl In the weekly meeting we thought about your workload when reviewing this feature and deciding if we should enable it by default. Do you think you would want to have this feature enabled by default?

Signed-off-by: Nitai Caro <[email protected]>

src/replication.c

Signed-off-by: Nitai Caro <[email protected]>

…emove duplicate calls to resetBioRDBSaveState Signed-off-by: Nitai Caro <[email protected]>

Signed-off-by: Nitai Caro <[email protected]>

src/bio.c

src/replication.c

src/server.h

src/replication.c

src/server.h

Signed-off-by: Nitai Caro <[email protected]>

…iles Signed-off-by: Nitai Caro <[email protected]>

…read

tests/integration/replica-redirect.tcl

ranshid

LGTM thank you @nitaicaro - good work!

xbasel self-requested a review February 26, 2025 10:34

nitaicaro changed the title ~~save to disk in bio thread - draft~~ Save to Disk in Bio Thread - draft Feb 26, 2025

nitaicaro commented Feb 26, 2025

View reviewed changes

src/bio.c Outdated Show resolved Hide resolved

nitaicaro force-pushed the replica-save-to-disk-in-bio-thread branch 5 times, most recently from 2d9c776 to 466a0ca Compare March 4, 2025 14:20

nitaicaro force-pushed the replica-save-to-disk-in-bio-thread branch 8 times, most recently from ad1b8fc to f1418b1 Compare March 11, 2025 12:15

xbasel reviewed Mar 17, 2025

View reviewed changes

nitaicaro force-pushed the replica-save-to-disk-in-bio-thread branch 2 times, most recently from 485e7d2 to 3bd80d3 Compare March 24, 2025 12:39

xbasel reviewed Apr 1, 2025

View reviewed changes

src/bio.c Show resolved Hide resolved

nitaicaro changed the title ~~Save to Disk in Bio Thread - draft~~ Save to Disk in Bio Thread Apr 1, 2025

nitaicaro marked this pull request as ready for review April 1, 2025 15:18

ranshid added the major-decision-pending Major decision pending by TSC team label May 5, 2025

madolson reviewed May 19, 2025

View reviewed changes

madolson added major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels May 19, 2025

xbasel added this to Valkey 9.0 May 19, 2025

nitaicaro force-pushed the replica-save-to-disk-in-bio-thread branch from d48d9d3 to f6cfe65 Compare July 24, 2025 13:01

Add missing call to resetBioRDBSaveState

c821bfd

Signed-off-by: Nitai Caro <[email protected]>

nitaicaro force-pushed the replica-save-to-disk-in-bio-thread branch from 3a07583 to c821bfd Compare July 27, 2025 13:04

ranshid reviewed Jul 28, 2025

View reviewed changes

src/replication.c Outdated Show resolved Hide resolved

src/replication.c Outdated Show resolved Hide resolved

ranshid reviewed Jul 28, 2025

View reviewed changes

src/replication.c Outdated Show resolved Hide resolved

ranshid reviewed Jul 28, 2025

View reviewed changes

src/replication.c Show resolved Hide resolved

Nitai Caro added 2 commits July 28, 2025 08:26

Remove relaxed access of repl_transfer_lastio

7aaa3f7

Signed-off-by: Nitai Caro <[email protected]>

Remove relaxed access of server.replica_bio_abort_save

c844508

Signed-off-by: Nitai Caro <[email protected]>

nitaicaro force-pushed the replica-save-to-disk-in-bio-thread branch from 0957033 to c844508 Compare July 28, 2025 10:55

Nitai Caro added 2 commits July 28, 2025 11:55

reset the bio state earlier in handleBioThreadFinishedRDBDownload + r…

11f1c89

…emove duplicate calls to resetBioRDBSaveState Signed-off-by: Nitai Caro <[email protected]>

formatter

c944bfd

Signed-off-by: Nitai Caro <[email protected]>