Skip to content

Conversation

@uriyage
Copy link
Contributor

@uriyage uriyage commented Dec 24, 2024

This PR offloads the write to replica clients to IO threads.

Main Changes

  • Replica writes will be offloaded but only after the replica is in online mode..
  • Replica reads will still be done in the main thread to reduce complexity and because read traffic from replicas is negligible.

Implementation Details

In order to offload the writes, writeToReplica has been split into 2 parts:

  1. The write itself made by the IO thread or by the main thread
  2. The post write where we update the replication buffers refcount will be done in the main-thread after the write-job is done in the IO thread (similar to what we do with a regular client)

Additional Changes

  • In writeToReplica we now use writev in case more than 1 buffer exists.
  • Changed client nwritten field to ssize_t since with a replica the nwritten can theoretically exceed int size (not subject to NET_MAX_WRITES_PER_EVENT limit).
  • Changed parsing code to use memchr instead of strchr:
    • During parsing command, ASAN got stuck for unknown reason when called to strchr to look for the next \r
    • Adding assert for null-terminated querybuf didn't resolve the issue.
    • Switched to memchr as it's more secure and resolves the issue

Testing

  • Added integration tests
  • Added unit tests

Related issue: #761

@uriyage uriyage force-pushed the replication-offload branch from 3aee49b to 846c816 Compare December 24, 2024 07:24
@codecov
Copy link

codecov bot commented Dec 24, 2024

Codecov Report

Attention: Patch coverage is 80.45977% with 17 lines in your changes missing coverage. Please review.

Project coverage is 71.06%. Comparing base (0579103) to head (b445576).
Report is 6 commits behind head on unstable.

Files with missing lines Patch % Lines
src/io_threads.c 0.00% 12 Missing ⚠️
src/networking.c 93.33% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1485      +/-   ##
============================================
- Coverage     71.06%   71.06%   -0.01%     
============================================
  Files           121      121              
  Lines         65237    65311      +74     
============================================
+ Hits          46360    46411      +51     
- Misses        18877    18900      +23     
Files with missing lines Coverage Δ
src/replication.c 87.39% <ø> (+0.11%) ⬆️
src/server.h 100.00% <ø> (ø)
src/networking.c 88.88% <93.33%> (-0.08%) ⬇️
src/io_threads.c 6.82% <0.00%> (-0.12%) ⬇️

... and 14 files with indirect coverage changes

Copy link
Member

@xbasel xbasel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi - I tested this code and I got this error. I did not see it with HEAD~1

2576783:S 06 Jan 2025 16:06:12.381 # Protocol error (Master using the inline protocol. Desync?) from client: id=6 addr=127.0.0.1:6379 laddr=127.0.0.1:56284 fd=11 name=*redacted* age=43 idle=0 flags=M db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=22870 qbuf-free=18084 argv-mem=0 multi-mem=0 rbs=1024 rbp=42 obl=0 oll=0 omem=0 tot-mem=42880 events=r cmd=set user=*redacted* redir=-1 resp=2 lib-name= lib-ver= tot-net-in=218001645 tot-net-out=1809 tot-cmds=1267450. Query buffer during protocol error: 'SET..$16..key:__rand_int__..$128..VXKeHogKgJ=[5V9_X^b?48OKF2jGA<' (... more 896 bytes ...) 'mcS2^N1J?ELSX@CfKQ7cM5aea\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\Rm\..'

I did run: src/valkey-benchmark -t set -d 128 -n 5000000 --threads 10
valkey with 4 io threads

Looks like data is corrupted, look at the second message:
*3\r $3\r SET\r $16\r key:000000630274\r $128\r VXKeHogKgJ=[5V9_X^b?48OKF2jGA<f:iR@50o7dS3JV4Q6L68lC[GTA]0DaMg?_oSmcS2^N1J?ELSX@CfKQ7cM5aea\\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\\Rm\\\r *3\r $3\r SET\r $16\r key:000000420097\r $128\r VXKeHogKgJ=[5V9_X^b?48OKF2jGA<f:iR@50o7dS3JV4Q6L68lC[GTA]0DaMg?_oSmcS2^N1J?ELSX@CfKQ7cM5aea\\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\\Rm\\\r o7dS3JV4Q6L68lC[GTA]0DaMg?_oSmcS2^N1J?ELSX@CfKQ7cM5aea\\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\\Rm\\\r

The payload o7dS3JV4Q6L68lC[GTA]0DaMg?_oSmcS2^N1J?ELSX@CfKQ7cM5aea\\ngY8a3LGgNVa9eRA46XS8>7ABe1>Jl9O\\Rm\\\r is duplicated.

@uriyage
Copy link
Contributor Author

uriyage commented Jan 8, 2025

fyi - I tested this code and I got this error. I did not see it with HEAD~1

Many thanks @xbasel for finding it.
I fixed the issue, it was a stop condition that was mistakenly removed.

@madolson madolson requested review from madolson and ranshid February 3, 2025 15:56
Copy link
Member

@ranshid ranshid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uriyage Please just resolve the conlicts. We will merge it right after the tests are passing

@uriyage uriyage force-pushed the replication-offload branch from 76576ac to 2fb5d11 Compare February 6, 2025 16:26
@uriyage
Copy link
Contributor Author

uriyage commented Feb 6, 2025

@uriyage Please just resolve the conlicts. We will merge it right after the tests are passing

@ranshid, Done.

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Uri Yagelnik <[email protected]>
@uriyage uriyage force-pushed the replication-offload branch from 2fb5d11 to 46202aa Compare February 9, 2025 09:37
Signed-off-by: Uri Yagelnik <[email protected]>
@ranshid ranshid merged commit de3672a into valkey-io:unstable Feb 9, 2025
50 checks passed
@zuiderkwast
Copy link
Contributor

Use-after-free in unit test (ASAN & Valgrind), in freeReplicaReferencedReplBuffer.

https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927928217#step:10:147

@uriyage
Copy link
Contributor Author

uriyage commented Feb 10, 2025

Use-after-free in unit test (ASAN & Valgrind), in freeReplicaReferencedReplBuffer.

https://github.com/valkey-io/valkey/actions/runs/13230922385/job/36927928217#step:10:147

Fixed in: https://github.com/valkey-io/valkey/pull/1697/files

xbasel pushed a commit to xbasel/valkey that referenced this pull request Mar 27, 2025
This PR offloads the write to replica clients to IO threads.

## Main Changes

* Replica writes will be offloaded but only after the replica is in
online mode..
* Replica reads will still be done in the main thread to reduce
complexity and because read traffic from replicas is negligible.
### Implementation Details

In order to offload the writes, `writeToReplica` has been split into 2
parts:
1. The write itself made by the IO thread or by the main thread
2. The post write where we update the replication buffers refcount will
be done in the main-thread after the write-job is done in the IO thread
(similar to what we do with a regular client)

### Additional Changes

* In `writeToReplica` we now use `writev` in case more than 1 buffer
exists.
* Changed client `nwritten` field to `ssize_t` since with a replica the
`nwritten` can theoretically exceed `int` size (not subject to
`NET_MAX_WRITES_PER_EVENT` limit).
* Changed parsing code to use `memchr` instead of `strchr`:
* During parsing command, ASAN got stuck for unknown reason when called
to `strchr` to look for the next `\r`
  * Adding assert for null-terminated querybuf didn't resolve the issue.
  * Switched to `memchr` as it's more secure and resolves the issue

### Testing
* Added integration tests
* Added unit tests

**Related issue:** valkey-io#761

---------

Signed-off-by: Uri Yagelnik <[email protected]>
xbasel pushed a commit to xbasel/valkey that referenced this pull request Mar 27, 2025
This PR offloads the write to replica clients to IO threads.

## Main Changes

* Replica writes will be offloaded but only after the replica is in
online mode..
* Replica reads will still be done in the main thread to reduce
complexity and because read traffic from replicas is negligible.
### Implementation Details

In order to offload the writes, `writeToReplica` has been split into 2
parts:
1. The write itself made by the IO thread or by the main thread
2. The post write where we update the replication buffers refcount will
be done in the main-thread after the write-job is done in the IO thread
(similar to what we do with a regular client)

### Additional Changes

* In `writeToReplica` we now use `writev` in case more than 1 buffer
exists.
* Changed client `nwritten` field to `ssize_t` since with a replica the
`nwritten` can theoretically exceed `int` size (not subject to
`NET_MAX_WRITES_PER_EVENT` limit).
* Changed parsing code to use `memchr` instead of `strchr`:
* During parsing command, ASAN got stuck for unknown reason when called
to `strchr` to look for the next `\r`
  * Adding assert for null-terminated querybuf didn't resolve the issue.
  * Switched to `memchr` as it's more secure and resolves the issue

### Testing
* Added integration tests
* Added unit tests

**Related issue:** valkey-io#761

---------

Signed-off-by: Uri Yagelnik <[email protected]>
murphyjacob4 pushed a commit to enjoy-binbin/valkey that referenced this pull request Apr 13, 2025
This PR offloads the write to replica clients to IO threads.

## Main Changes

* Replica writes will be offloaded but only after the replica is in
online mode..
* Replica reads will still be done in the main thread to reduce
complexity and because read traffic from replicas is negligible.
### Implementation Details

In order to offload the writes, `writeToReplica` has been split into 2
parts:
1. The write itself made by the IO thread or by the main thread
2. The post write where we update the replication buffers refcount will
be done in the main-thread after the write-job is done in the IO thread
(similar to what we do with a regular client)

### Additional Changes

* In `writeToReplica` we now use `writev` in case more than 1 buffer
exists.
* Changed client `nwritten` field to `ssize_t` since with a replica the
`nwritten` can theoretically exceed `int` size (not subject to
`NET_MAX_WRITES_PER_EVENT` limit).
* Changed parsing code to use `memchr` instead of `strchr`:
* During parsing command, ASAN got stuck for unknown reason when called
to `strchr` to look for the next `\r`
  * Adding assert for null-terminated querybuf didn't resolve the issue.
  * Switched to `memchr` as it's more secure and resolves the issue

### Testing
* Added integration tests
* Added unit tests

**Related issue:** valkey-io#761

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants