raftstore-v2: compact and gc raft logs#13846
Conversation
Signed-off-by: tabokie <[email protected]>
|
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. DetailsReviewer can indicate their review by submitting an approval review. |
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
components/raftstore-v2/src/operation/command/admin/compact_log.rs
Outdated
Show resolved
Hide resolved
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
|
Logs can't be gced until its data are all applied and flushed. So log gc depends on removing WAL. I suggest to continue the work after removing wal is supported. |
…tlog-gc Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
components/raftstore-v2/src/operation/command/admin/compact_log.rs
Outdated
Show resolved
Hide resolved
components/raftstore-v2/src/operation/command/admin/compact_log.rs
Outdated
Show resolved
Hide resolved
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
| RAFT_LOG_GC_DELETED_KEYS_HISTOGRAM.observe(n as f64); | ||
| } | ||
| } | ||
| if let Err(e) = self.engines.raft.consume(&mut batch, false) { |
There was a problem hiding this comment.
Maybe you can keep the most of original flush by:
fn gc_raft_log() {
...
consume(batch)
...
}
fn flush() {
...
for _ in _ {
engine.gc(batch);
}
...
}
So most code is unchanged and easier to understand.
There was a problem hiding this comment.
If it's the failpoint bothering you, the old code is simply wrong. It skips writing log batch but doesn't skip consuming callbacks. It is not the intended behavior as per
tikv/tests/failpoints/cases/test_snap.rs
Lines 701 to 702 in 8864e9b
There was a problem hiding this comment.
It's actually the expected behavior. Because we just ignore any write errors and only prints a log. The callbacks are used to ensure no more background writes to the same region.
There was a problem hiding this comment.
Regardless of the behavior it is not what the test comment says, because tasks are lost forever. For RocksDB GC it means some entries will never be compacted again. So we will change the failpoint comment if that's what you prefer.
There was a problem hiding this comment.
The return action is to simulate the case that the node is stopped before completing GC. It's more appropriate to use "pause", but "pause" will stop the case from restarting node.
On the other hand, writing raft engine can fail. If that error happens, we just ignore it for now. I think they are two different things.
| self.fsm.peer.mut_store().evict_entry_cache(true); | ||
| } | ||
| let mut _usage = 0; | ||
| if memory_usage_reaches_high_water(&mut _usage) |
There was a problem hiding this comment.
In the next tick it will call needs_evict_entry_cache again, which calls memory_usage_reaches_high_water inside. I think checking later again makes more sense than checking right after a compaction.
Also, this further hides the function memory_usage_reaches_high_water and keeps the pattern consistent with
tikv/components/raftstore/src/store/fsm/peer.rs
Lines 5350 to 5355 in 8864e9b
| self.entry_storage().truncated_index(), | ||
| self.storage().apply_trace().persisted_apply_index(), | ||
| ); | ||
| if compact_index > self.last_engine_compact_log_index() { |
There was a problem hiding this comment.
last_engine_compact_log_index may not be necessary. For on_ready_res, it should compact log if truncated_index < persisted_applied before applying the updates, For record_apply_trace, it should compact log if persisted_applied < truncated_index.
There was a problem hiding this comment.
You are right. But it makes the code a little more complex, and it introduces an extra dependency between caller and callee.
It wouldn't work if one index advances across another index,
There was a problem hiding this comment.
Why? For example, truncated_index is advanced from 3 to 10 in on_ready_res, while persisted_applied is 5. Because 3 < 5, so it should gc logs using cmp::min(10, 5) = 5.
There was a problem hiding this comment.
I see. It is still very complex.
There was a problem hiding this comment.
It's just two if check to get rid of a field (and two if check).
Signed-off-by: tabokie <[email protected]>
…tlog-gc Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
| if found_flush_state[cf_id as usize] { | ||
| batch.0.delete(raft_group_id, key.to_vec()); | ||
| } else { | ||
| found_flush_state[cf_id as usize] = true; |
There was a problem hiding this comment.
Will IndexOutOfBound if cf_id == MAX_CF_ID.
| .and_then(|_| { | ||
| store_ctx | ||
| .engine | ||
| .delete_all_but_one_states_before(region_id, compact_index, lb) |
There was a problem hiding this comment.
I don't get it. It should delete those states whenever persisted_applied is advanced. For example, even the truncated index is 3, but it can delete the states at 6 when applied index is 10 and there is another state at 7.
| RAFT_LOG_GC_DELETED_KEYS_HISTOGRAM.observe(n as f64); | ||
| } | ||
| } | ||
| if let Err(e) = self.engines.raft.consume(&mut batch, false) { |
There was a problem hiding this comment.
It's actually the expected behavior. Because we just ignore any write errors and only prints a log. The callbacks are used to ensure no more background writes to the same region.
Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
83d2930 to
d744206
Compare
| } | ||
|
|
||
| let data = req.write_to_bytes().unwrap(); | ||
| self.propose_with_ctx(store_ctx, data, vec![]) |
There was a problem hiding this comment.
| self.propose_with_ctx(store_ctx, data, vec![]) | |
| self.propose(store_ctx, data) |
| } | ||
| // TODO: check is_merging | ||
| // compact failure is safe to be omitted, no need to assert. | ||
| if res.compact_index <= self.entry_storage().truncated_index() { |
There was a problem hiding this comment.
This is covered by checking first_index.
| .apply_state() | ||
| .get_truncated_state() | ||
| .get_index(); | ||
| self.entry_storage_mut() |
There was a problem hiding this comment.
Save apply_state_mut as a temporary can reduce code.
| self.set_has_extra_write(); | ||
| } | ||
| } | ||
| self.maybe_compact_log_from_engine(store_ctx, Either::Left(old_persisted)); |
There was a problem hiding this comment.
I think it should be included in if above.
| Either::Right(old_truncated) if old_truncated >= persisted => return, | ||
| _ => {} | ||
| } | ||
| let compact_index = std::cmp::min( |
There was a problem hiding this comment.
You mean let compact_index = std::cmp::min(truncated, persisted)?
| if let Some(last_heartbeat) = self.peer_heartbeats.get(&peer_id) | ||
| && *last_heartbeat >= *deadline | ||
| { | ||
| return true; | ||
| } | ||
| false | ||
| } |
There was a problem hiding this comment.
You mean matches!(self.peer_heartbeats.get(&peer_id), Some(last_heartbat) if *last_heartbeat >= *deadline)?
| } | ||
| } | ||
| RAFT_LOG_GC_WRITE_DURATION_HISTOGRAM.observe(start.saturating_elapsed_secs()); | ||
| for cb in cbs { |
There was a problem hiding this comment.
Callback are not collected and invoked.
Signed-off-by: tabokie <[email protected]>
…tlog-gc Signed-off-by: tabokie <[email protected]>
Signed-off-by: tabokie <[email protected]>
|
Latest commit needs review. @bufferflies @BusyJay |
|
/merge |
|
@tabokie: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
This pull request has been accepted and is ready to merge. DetailsCommit hash: 97f9300 |
Signed-off-by: tabokie [email protected]
What is changed and how it works?
Issue Number: Ref #12842
What's Changed:
Related changes
Check List
Tests
Release note