Replication Flow Control – Prioritizing replication traffic in the replica #1838

xbasel · 2025-03-11T17:29:17Z

Overview

This PR introduces Replication Flow Control (repl-flow-control), a dynamic mechanism that prioritizes replication traffic on the replica side. By detecting replication pressure and adjusting read frequency adaptively, it reduces the risk of primary buffer overflows and full syncs.

Problem

In high-load scenarios, a replica might not consume replication data fast enough, leading to backpressure on the primary. When the primary’s buffer overflows, replication lag increases and it can drops the replica connection, triggering a full sync, a costly operation that impacts system performance.

Without this feature:

Replication reads occur at a fixed rate, irrespective of data pressure.
If the replica falls behind, the primary accumulates replication data leading to higher memory utilization.
Once the primary buffer overflows, the connection drops, forcing a full sync.
Full syncs cause high memory, CPU, and I/O spikes.

Solution: Replication Flow Control

repl-flow-control enables the replica to dynamically increase its replication read rate if it detects that replication data is accumulating. The mechanism operates as follows:

Detecting replication pressure
Each read from the primary is checked against the max buffer byte limit. If the read hit the limit (filled the buffer), suggesting more data is likely available.

Prioritizing replication reads
If replication pressure is detected, the replica invokes multiple reads per I/O event instead of a single one. This allows the replica to catch up faster, reducing memory consumption and buffer overflows on the primary.

Performance Impact

Test setup:

Bombard the replica with expensive commands, leading to high CPU utilization
Write to the main database to trigger replication traffic

Latency and Throughput Changes

Metric	Before (repl-flow-control Disabled)	After (repl-flow-control Enabled)
Throughput (requests/sec)	941.71	760.98
Avg Latency (ms)	52.865	65.534
p50 Latency (ms)	59.743	68.543
p95 Latency (ms)	79.231	106.687
p99 Latency (ms)	90.303	126.527
Max Latency (ms)	188.031	385.535

📌 Observations:

Replication stability improves,no full syncs were observed after enabling flow control.
Higher latency for normal clients due to increased resource allocation for replication.
CPU and memory usage remain stable, with no major overhead.
Replica throughput slightly decreases as replication takes priority.

TODO

Consider limiting the maximum number of reads per event to a ratio of the total number of events returned by the epoll cycle. For example, if the ratio is 20% and EPOLL returns 100 events, the replica can read from the primary up to 20 times per primary I/O event.

Implements #1596

codecov · 2025-03-11T17:45:13Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.19%. Comparing base (dfd91bf) to head (c514dca).
Report is 34 commits behind head on unstable.

Additional details and impacted files

@@            Coverage Diff            @@
##           unstable    #1838   +/-   ##
=========================================
  Coverage     71.19%   71.19%           
=========================================
  Files           122      122           
  Lines         66024    66031    +7     
=========================================
+ Hits          47007    47012    +5     
- Misses        19017    19019    +2

Files with missing lines	Coverage Δ
src/networking.c	`87.43% <100.00%> (+0.14%)`	⬆️

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Adds Replication flow control (repl-flow-control) to adjust replication read frequency based on buffer pressure. Helps replicas keep up with replication data and reduces primary buffer utilization and overflows. - Dynamic replication read scaling based on buffer pressure. - Reduces full syncs by increasing replication reads when needed. - Improves replication responsiveness, reduces data staleness. - Trade-offs: Slightly higher client latency due to replication prioritization. Replication was handled like a normal client. Under high load in the replica, replication lag increased, making data stale and caused primary buffer overflows, triggering full syncs and high CPU/memory/I/O usage. - Fewer full syncs from buffer overruns. - Lower replication lag, fresher data on replicas. - More stable primary buffer usage, less swapping. - Slightly higher client latency due to replication prioritization. Signed-off-by: xbasel <[email protected]>

valkey.conf

src/server.h

src/networking.c

zuiderkwast

This looks to me like we're fixing a stability issue. Why would anyone want to disable it? Let's discuss if we actually need a config for this. I think we maybe don't need it and we can just keep this always enabled.

I think the benchmark numbers don't give a fair picture. Without this feature, there is a problem of replication lag and for the full sync, it means extra resources used by replica and primary and maybe even extra latency for commands sent to the primary.

Also, this affects only the latency of read-from-replicas. I think this is less important than read-from-primary. If the replica is not fast enough, the client can read from the primary or from another replica. If they do that, it's good because it will reduce the load on the overloaded replica.

artikell · 2025-03-25T13:31:33Z

This seems to be the rate limiting for the control reading stage. There are two scenarios that also need to be considered:

The time-consuming of executing commands is too high. like: ZDIFFSTORE some big key
In the case of multi-threading, will reading not take up more time?

Signed-off-by: xbasel <[email protected]>

simplify impl Signed-off-by: xbasel <[email protected]>

xbasel · 2025-03-26T01:39:18Z

This seems to be the rate limiting for the control reading stage. There are two scenarios that also need to be considered:
* The time-consuming of executing commands is too high. like: `ZDIFFSTORE` some big key

* In the case of multi-threading, will reading not take up more time?

Could you clarify what you meant about the control reading stage and those two scenarios?
Do you mean if clients run expensive commands in the replica, this mechanism won't be effective? Or that if we read more from the primary in one go and process expensive commands in the replication context, the replica might become even more unresponsive?

Signed-off-by: xbasel <[email protected]>

artikell · 2025-03-26T02:26:50Z

This seems to be the rate limiting for the control reading stage. There are two scenarios that also need to be considered:
* The time-consuming of executing commands is too high. like: `ZDIFFSTORE` some big key

* In the case of multi-threading, will reading not take up more time?
Could you clarify what you meant about the control reading stage and those two scenarios? Do you mean if clients run expensive commands in the replica, this mechanism won't be effective? Or that if we read more from the primary in one go and process expensive commands in the replication context, the replica might become even more unresponsive?

As you understand, in many cases, expensive commands don't necessarily have a large throughput. But they can also lead to the disconnection between the master and the slave.

src/nohup.out

madolson

I like the new design, mostly minor comments.

valkey.conf

src/networking.c

valkey.conf

Signed-off-by: xbasel <[email protected]>

src/config.c

Signed-off-by: xbasel <[email protected]>

Co-authored-by: Madelyn Olson <[email protected]> Signed-off-by: xbasel <[email protected]>

Signed-off-by: xbasel <[email protected]>

zuiderkwast

This PR became very small and simple without the config. Nice. Just some minor comments.

src/server.h

src/networking.c

Signed-off-by: xbasel <[email protected]>

zuiderkwast

Just nits.

src/networking.c

Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: xbasel <[email protected]>

Signed-off-by: xbasel <[email protected]>

## Overview In high-load scenarios, a replica might not consume replication data fast enough, leading to backpressure on the primary. When the primary’s buffer overflows, replication lag increases and it can drops the replica connection, triggering a full sync, a costly operation that impacts system performance. The solution is to read from replication clients until their is no longer pending data, up to 25 iterations. ## Performance Impact ## Test setup: 1. Bombard the replica with expensive commands, leading to high CPU utilization 2. Write to the main database to trigger replication traffic Metric | Before (repl-flow-control Disabled) | After (repl-flow-control Enabled) -- | -- | -- Throughput (requests/sec) | 941.71 | 760.98 Avg Latency (ms) | 52.865 | 65.534 p50 Latency (ms) | 59.743 | 68.543 p95 Latency (ms) | 79.231 | 106.687 p99 Latency (ms) | 90.303 | 126.527 Max Latency (ms) | 188.031 | 385.535 - Replication stability improves, no full syncs were observed after enabling flow control. - Higher latency for normal clients due to increased resource allocation for replication. - CPU and memory usage remain stable, with no major overhead. - Replica throughput slightly decreases as replication takes priority. Implements valkey-io#1596 --------- Signed-off-by: xbasel <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: chzhoo <[email protected]>

## Overview In high-load scenarios, a replica might not consume replication data fast enough, leading to backpressure on the primary. When the primary’s buffer overflows, replication lag increases and it can drops the replica connection, triggering a full sync, a costly operation that impacts system performance. The solution is to read from replication clients until their is no longer pending data, up to 25 iterations. ## Performance Impact ## Test setup: 1. Bombard the replica with expensive commands, leading to high CPU utilization 2. Write to the main database to trigger replication traffic Metric | Before (repl-flow-control Disabled) | After (repl-flow-control Enabled) -- | -- | -- Throughput (requests/sec) | 941.71 | 760.98 Avg Latency (ms) | 52.865 | 65.534 p50 Latency (ms) | 59.743 | 68.543 p95 Latency (ms) | 79.231 | 106.687 p99 Latency (ms) | 90.303 | 126.527 Max Latency (ms) | 188.031 | 385.535 - Replication stability improves, no full syncs were observed after enabling flow control. - Higher latency for normal clients due to increased resource allocation for replication. - CPU and memory usage remain stable, with no major overhead. - Replica throughput slightly decreases as replication takes priority. Implements valkey-io#1596 --------- Signed-off-by: xbasel <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: shanwan1 <[email protected]>

xbasel marked this pull request as draft March 11, 2025 17:42

xbasel force-pushed the flowcontrol branch from 8bc5f5b to b2783ee Compare March 11, 2025 17:45

xbasel changed the title ~~Replication Flow Control – Prioritizing replica reads to prevent primary buffer overflows and high replication lag~~ Replication Flow Control – Prioritizing replication traffic in the replica side Mar 11, 2025

xbasel force-pushed the flowcontrol branch 5 times, most recently from d52eadb to e3bcd5f Compare March 11, 2025 18:45

xbasel force-pushed the flowcontrol branch from e3bcd5f to e846d9e Compare March 11, 2025 19:18

xbasel mentioned this pull request Mar 11, 2025

[NEW] Introduce Quality-of-Service for the replication stream to reduce full sync as a result of buffer overruns #1596

Open

xbasel changed the title ~~Replication Flow Control – Prioritizing replication traffic in the replica side~~ Replication Flow Control – Prioritizing replication traffic in the replica Mar 11, 2025

xbasel marked this pull request as ready for review March 11, 2025 20:09

hwware reviewed Mar 12, 2025

View reviewed changes

valkey.conf Outdated Show resolved Hide resolved

src/server.h Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

madolson reviewed Mar 12, 2025

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

zuiderkwast reviewed Mar 18, 2025

View reviewed changes

hwware mentioned this pull request Mar 24, 2025

[NEW] Add Keysizes section, bigkeys and hotkeys, and maxmemory-dataset #1880

Open

xbasel added 5 commits March 26, 2025 03:07

change qb_full_read to bool

6a21b81

Signed-off-by: xbasel <[email protected]>

Merge remote-tracking branch 'origin/unstable' into flowcontrol

e744bc1

rename qb_full_read

cf3cc8b

Signed-off-by: xbasel <[email protected]>

Initialize repl_cur_reads_per_io_event

d9557a8

Signed-off-by: xbasel <[email protected]>

remove ramp-up , apply max rate immediately for flow control. This will

8e3ada0

simplify impl Signed-off-by: xbasel <[email protected]>

xbasel added 2 commits March 26, 2025 03:46

format

8a59e43

Signed-off-by: xbasel <[email protected]>

format

c339066

Signed-off-by: xbasel <[email protected]>

Fusl reviewed Apr 14, 2025

View reviewed changes

src/nohup.out Outdated Show resolved Hide resolved

madolson reviewed Apr 14, 2025

View reviewed changes

valkey.conf Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

valkey.conf Outdated Show resolved Hide resolved

remove repl-flow-control-enabled

fde6110

Signed-off-by: xbasel <[email protected]>

xbasel force-pushed the flowcontrol branch from 8ee9573 to fde6110 Compare April 17, 2025 20:04

fix formatting

a69dd72

Signed-off-by: xbasel <[email protected]>

zuiderkwast added the major-decision-pending Major decision pending by TSC team label May 2, 2025

madolson reviewed May 12, 2025

View reviewed changes

src/config.c Outdated Show resolved Hide resolved

madolson added major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels May 12, 2025

xbasel and others added 6 commits May 15, 2025 14:26

Remove config

13d38d0

Signed-off-by: xbasel <[email protected]>

Merge remote-tracking branch 'origin/unstable' into flowcontrol

9030fbd

Remove nohup

e282f46

Signed-off-by: xbasel <[email protected]>

Update src/networking.c

10642ab

Co-authored-by: Madelyn Olson <[email protected]> Signed-off-by: xbasel <[email protected]>

Update src/networking.c

a9b66b9

Co-authored-by: Madelyn Olson <[email protected]> Signed-off-by: xbasel <[email protected]>

change comment

d69d383

Signed-off-by: xbasel <[email protected]>

xbasel requested a review from zuiderkwast May 15, 2025 11:55

fix bug

499de53

Signed-off-by: xbasel <[email protected]>

zuiderkwast reviewed May 15, 2025

View reviewed changes

src/server.h Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

format

68d51b3

Signed-off-by: xbasel <[email protected]>

xbasel force-pushed the flowcontrol branch from 51212ed to 4efccac Compare May 15, 2025 14:37

xbasel requested a review from zuiderkwast May 15, 2025 15:08

simplify

df22a2b

Signed-off-by: xbasel <[email protected]>

xbasel force-pushed the flowcontrol branch from 4efccac to df22a2b Compare May 15, 2025 15:12

zuiderkwast reviewed May 15, 2025

View reviewed changes

src/networking.c Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

src/networking.c Outdated Show resolved Hide resolved

xbasel and others added 2 commits May 16, 2025 01:46

Update src/networking.c

f930b2d

Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: xbasel <[email protected]>

review

c514dca

Signed-off-by: xbasel <[email protected]>

zuiderkwast approved these changes May 16, 2025

View reviewed changes

zuiderkwast added the release-notes This issue should get a line item in the release notes label May 16, 2025

madolson approved these changes May 28, 2025

View reviewed changes

madolson merged commit f5b92f5 into valkey-io:unstable May 28, 2025
51 checks passed

Replication Flow Control – Prioritizing replication traffic in the replica #1838

Replication Flow Control – Prioritizing replication traffic in the replica #1838

Uh oh!

Conversation

xbasel commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Problem

Solution: Replication Flow Control

Performance Impact

Latency and Throughput Changes

Uh oh!

codecov bot commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

artikell commented Mar 25, 2025

Uh oh!

xbasel commented Mar 26, 2025

Uh oh!

artikell commented Mar 26, 2025

Uh oh!

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

xbasel commented Mar 11, 2025 •

edited

Loading

codecov bot commented Mar 11, 2025 •

edited

Loading