-
Notifications
You must be signed in to change notification settings - Fork 955
Allow partial sync after loading AOF with preamble #2366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow partial sync after loading AOF with preamble #2366
Conversation
e594366 to
462e1f5
Compare
|
|
thx for your interest. The main purpose of this PR is to clarify why we missed this in preamble-aof, and make sure adding this will not break the consistency of data and replication. I will try add more test cases. |
@secwall I still can not get an error after a weekend of testing. Can you give some hints on reproducing? |
|
Here is how I get an error: It fails on failover loop test: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2366 +/- ##
============================================
- Coverage 72.64% 72.37% -0.27%
============================================
Files 128 128
Lines 71278 70284 -994
============================================
- Hits 51777 50866 -911
+ Misses 19501 19418 -83
🚀 New features to boost your workflow:
|
|
@arthurkiller can you also signoff the commits? |
a52ec12 to
98ad37b
Compare
@ranshid done @secwall I've fixed this. Cause repl_buffer is created when we get a replica attached. We'd create this on data load. I'll add a tcl case for this later. Not so sure if we use this replid from the previous RDB, will it make any side effects |
Currently, our cluster cases can not cover this situation I still failed get this by run cluster test |
|
Great finding @arthurkiller! One of the main points of AOF is to be able to avoid full sync after a restart, so it's pretty sad that it doesn't work and isn't tested. Maybe this is why our documentation about prsistence says this:
It's too late to get this in 9.0 but I hope we can have it ready for 9.1. Will you add a test of the scenario (AOF rewrite, restart, psync)? |
@zuiderkwast Thank you for your attention. I almost forgot about this Pull Request :) . I'll try to add more comprehensive tests as soon as possible. Hectic these days. |
|
When you have time, will you add these two in some file under
We should use an AOF file with preamble and multiple AOF commands after the preamble to check that the offset is calculated correctly. |
3e9a4fd to
bf34cd1
Compare
b1d8058 to
1644ee2
Compare
Signed-off-by: arthur.lee <[email protected]>
Signed-off-by: arthur.lee <[email protected]>
Signed-off-by: arthur.lee <[email protected]>
Signed-off-by: arthur.lee <[email protected]>
Signed-off-by: arthur.lee <[email protected]>
1644ee2 to
2a53e11
Compare
zuiderkwast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some hints for the failed CI jobs.
We can ignore this one because it's a known flaky test: https://github.com/valkey-io/valkey/actions/runs/18647217373/job/53157000675?pr=2366#step:4:8645
be68807 to
caaa70a
Compare
I have fixed all the test cases, and now it can run stably. |
|
|
Not sure what the problem is, but maybe this helps. Before you restart, you can get the number of lines in the log file using set logfile [srv 0 stdout]
set num_lines [lindex [exec wc -l $logfile] 0]
# ...
restart_server
# ...
wait_for_log_messages 0 [list {*Pattern*}] $num_lines 100 100Another way is to count the matching lines before and after restart using |
Signed-off-by: arthur.lee <[email protected]>
caaa70a to
41d02f9
Compare
|
@zuiderkwast I think we are almost done, |
…eInfo Signed-off-by: arthur.lee <[email protected]>
9a31911 to
abbc6a2
Compare
Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: Arthur Lee <[email protected]> Signed-off-by: arthur.lee <[email protected]>
6ff5646 to
4842aa7
Compare
zuiderkwast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good, thank you for your patience!
I have only very minor comments about comments. (I can apply them before merging, if you don't.)
Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: Arthur Lee <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]> Signed-off-by: Arthur Lee <[email protected]>
|
Thanks a lot :) |
The AOF preamble mechanism replaces the traditional AOF base file with an RDB snapshot during rewrite operations, which reduces I/O overhead and improves loading performance. However, when valkey loads the RDB-formatted preamble from the base AOF file, it does not process the replication ID (replid) information within the RDB AUX fields. This omission has two limitations: * On a primary, it prevents the primary from accepting PSYNC continue requests after restarting with a preamble-enabled AOF file. * On a replica, it prevents the replica from successfully performing partial sync requests (avoiding full sync) after restarting with a preamble-enabled AOF file. To resolve this, this commit aligns the AOF preamble handling with the logic used for standalone RDB files, by storing the replication ID and replication offset in the AOF preamble and restoring them when loading the AOF file. Resolves valkey-io#2677 --------- Signed-off-by: arthur.lee <[email protected]> Signed-off-by: Arthur Lee <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
The AOF preamble mechanism replaces the traditional AOF base file with an RDB snapshot during rewrite operations, which reduces I/O overhead and improves loading performance. However, when valkey loads the RDB-formatted preamble from the base AOF file, it does not process the replication ID (replid) information within the RDB AUX fields. This omission has two limitations: * On a primary, it prevents the primary from accepting PSYNC continue requests after restarting with a preamble-enabled AOF file. * On a replica, it prevents the replica from successfully performing partial sync requests (avoiding full sync) after restarting with a preamble-enabled AOF file. To resolve this, this commit aligns the AOF preamble handling with the logic used for standalone RDB files, by storing the replication ID and replication offset in the AOF preamble and restoring them when loading the AOF file. Resolves valkey-io#2677 --------- Signed-off-by: arthur.lee <[email protected]> Signed-off-by: Arthur Lee <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
The AOF preamble mechanism replaces the traditional AOF base file with an RDB snapshot during rewrite operations, which reduces I/O overhead and improves loading performance. However, when valkey loads the RDB-formatted preamble from the base AOF file, it does not process the replication ID (replid) information within the RDB AUX fields. This omission has two limitations: * On a primary, it prevents the primary from accepting PSYNC continue requests after restarting with a preamble-enabled AOF file. * On a replica, it prevents the replica from successfully performing partial sync requests (avoiding full sync) after restarting with a preamble-enabled AOF file. To resolve this, this commit aligns the AOF preamble handling with the logic used for standalone RDB files, by storing the replication ID and replication offset in the AOF preamble and restoring them when loading the AOF file. Resolves valkey-io#2677 --------- Signed-off-by: arthur.lee <[email protected]> Signed-off-by: Arthur Lee <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
The AOF preamble mechanism replaces the traditional AOF base file with an RDB snapshot during rewrite operations, which reduces I/O overhead and improves loading performance. However, when valkey loads the RDB-formatted preamble from the base AOF file, it does not process the replication ID (replid) information within the RDB AUX fields. This omission has two limitations: * On a primary, it prevents the primary from accepting PSYNC continue requests after restarting with a preamble-enabled AOF file. * On a replica, it prevents the replica from successfully performing partial sync requests (avoiding full sync) after restarting with a preamble-enabled AOF file. To resolve this, this commit aligns the AOF preamble handling with the logic used for standalone RDB files, by storing the replication ID and replication offset in the AOF preamble and restoring them when loading the AOF file. Resolves valkey-io#2677 --------- Signed-off-by: arthur.lee <[email protected]> Signed-off-by: Arthur Lee <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
The AOF preamble mechanism replaces the traditional AOF base file with an RDB snapshot during rewrite operations, which reduces I/O overhead and improves loading performance. However, when valkey loads the RDB-formatted preamble from the base AOF file, it does not process the replication ID (replid) information within the RDB AUX fields. This omission has two limitations: * On a primary, it prevents the primary from accepting PSYNC continue requests after restarting with a preamble-enabled AOF file. * On a replica, it prevents the replica from successfully performing partial sync requests (avoiding full sync) after restarting with a preamble-enabled AOF file. To resolve this, this commit aligns the AOF preamble handling with the logic used for standalone RDB files, by storing the replication ID and replication offset in the AOF preamble and restoring them when loading the AOF file. Resolves valkey-io#2677 --------- Signed-off-by: arthur.lee <[email protected]> Signed-off-by: Arthur Lee <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
The AOF preamble mechanism replaces the traditional AOF base file with an RDB snapshot during rewrite operations, which reduces I/O overhead and improves loading performance.
However, when valkey loads the RDB-formatted preamble from the base AOF file, it does not process the replication ID (replid) information within the RDB AUX fields. This omission has two limitations:
To resolve this, this commit aligns the AOF preamble handling with the logic used for standalone RDB files, by storing the replication ID and replication offset in the AOF preamble and restoring them when loading the AOF file.
Resolves #2677