-
Notifications
You must be signed in to change notification settings - Fork 955
Separate RDB snapshotting from atomic slot migration #2533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate RDB snapshotting from atomic slot migration #2533
Conversation
Signed-off-by: Jacob Murphy <[email protected]>
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #2533 +/- ##
============================================
+ Coverage 72.21% 72.23% +0.01%
============================================
Files 127 127
Lines 70826 70934 +108
============================================
+ Hits 51147 51237 +90
- Misses 19679 19697 +18
🚀 New features to boost your workflow:
|
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Co-authored-by: Jacob Murphy <[email protected]> Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
murphyjacob4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madolson - I think you mentioned you had some opinions on the INFO fields during this weeks meeting?
Signed-off-by: Binbin <[email protected]>
4b4a508 to
a8b0994
Compare
Signed-off-by: Binbin <[email protected]>
…prove Signed-off-by: Binbin <[email protected]>
| addReplyBulkCString(c, "message"); | ||
| addReplyBulkCString(c, job->status_msg ? job->status_msg : ""); | ||
| addReplyBulkCString(c, "cow_size"); | ||
| addReplyLongLong(c, (long long)job->stat_cow_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also expose output buffer size in here? We may be missing this information right now to allow people to monitor its progress. (Or we could do this in another PR, by exposing both the import slot client and the export slot client in the client info, i.e. adding a client flag. But i can't think of a good flag char, since import source already taken 'I' char)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably go with 'i' and 'e', stand for import or export. But no_evict already taken 'e'.
So we can go with 'i' and 'm', stand for importing or migrating as the old word.s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To monitor it, use CLIENT LIST? 🤔 I guess it's possible, yes, but maybe it's easier to use if we put some progress information in CLUSTER SLOTMIGRATIONS.
Maybe we can do it at the same time as #2504, if we want valkey-cli to print some progress indicator in interactive mode (if stdout is a TTY).
…prove Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
…prove Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
When we adding atomic slot migration in valkey-io#1949, we reused a lot of rdb save code, it was an easier way to implement ASM in the first time, but it comes with some side effect. Like we are using CHILD_TYPE_RDB to do the fork, we use rdb.c/rdb.h function to save the snapshot, these mess up the logs (we will print some logs saying we are doing RDB stuff) and mess up the info fields (we will say we are rdb_bgsave_in_progress but actually we are doing slot migration). In addition, it makes the code difficult to maintain. The rdb_save method uses a lot of rdb_* variables, but we are actually doing slot migration. If we want to support one fork with multiple target nodes, we need to rewrite these code for a better cleanup. Note that the changes to rdb.c/rdb.h are reverting previous changes from when we was reusing this code for slot migration. The slot migration snapshot logic is similar to the previous diskless replication. We use pipe to transfer the snapshot data from the child process to the parent process. Interface changes: - New slot_migration_fork_in_progress info field. - New cow_size field in CLUSTER GETSLOTMIGRATIONS command. - Also add slot migration fork to the cluster class trace latency. Signed-off-by: Binbin <[email protected]> Signed-off-by: Jacob Murphy <[email protected]> Co-authored-by: Jacob Murphy <[email protected]>
When we adding atomic slot migration in #1949, we reused a lot of rdb save code, it was an easier way to implement ASM in the first time, but it comes with some side effect. Like we are using CHILD_TYPE_RDB to do the fork, we use rdb.c/rdb.h function to save the snapshot, these mess up the logs (we will print some logs saying we are doing RDB stuff) and mess up the info fields (we will say we are rdb_bgsave_in_progress but actually we are doing slot migration). In addition, it makes the code difficult to maintain. The rdb_save method uses a lot of rdb_* variables, but we are actually doing slot migration. If we want to support one fork with multiple target nodes, we need to rewrite these code for a better cleanup. Note that the changes to rdb.c/rdb.h are reverting previous changes from when we was reusing this code for slot migration. The slot migration snapshot logic is similar to the previous diskless replication. We use pipe to transfer the snapshot data from the child process to the parent process. Interface changes: - New slot_migration_fork_in_progress info field. - New cow_size field in CLUSTER GETSLOTMIGRATIONS command. - Also add slot migration fork to the cluster class trace latency. Signed-off-by: Binbin <[email protected]> Signed-off-by: Jacob Murphy <[email protected]> Co-authored-by: Jacob Murphy <[email protected]>
When we adding atomic slot migration in #1949, we reused a lot of rdb save code,
it was an easier way to implement ASM in the first time, but it comes with some
side effect. Like we are using CHILD_TYPE_RDB to do the fork, we use rdb.c/rdb.h
function to save the snapshot, these mess up the logs (we will print some logs
saying we are doing RDB stuff) and mess up the info fields (we will say we are
rdb_bgsave_in_progress but actually we are doing slot migration).
In addition, it makes the code difficult to maintain. The rdb_save method uses
a lot of rdb_* variables, but we are actually doing slot migration. If we want
to support one fork with multiple target nodes, we need to rewrite these code
for a better cleanup.
Note that the changes to rdb.c/rdb.h are reverting previous changes from when
we was reusing this code for slot migration. The slot migration snapshot logic
is similar to the previous diskless replication. We use pipe to transfer the
snapshot data from the child process to the parent process.
Interface changes: