gc_worker: use async_snapshot instead of raw API in GC#13322
gc_worker: use async_snapshot instead of raw API in GC#13322ti-chi-bot merged 48 commits intotikv:masterfrom
Conversation
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
|
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. DetailsReviewer can indicate their review by submitting an approval review. |
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
|
/test |
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
| self.pd_client.feature_gate().clone(), | ||
| Arc::new(self.region_info_accessor.clone()), | ||
| ); | ||
| gc_worker |
There was a problem hiding this comment.
Starting of the gc_worker is delayed to init_servers function, just before start_auto_gc. The purpose seems to be getting the store id when starting the gc_worker. I think delaying the starting of gc_worker doesn't cause any severe problem if test passes, but I would suggest wrapping the three "start" functions into a single start_gc_worker function.
src/server/gc_worker/gc_worker.rs
Outdated
| .debug_struct("Gc") | ||
| .field("start_key", &log_wrappers::Value::key(start_key)) | ||
| .field("end_key", &log_wrappers::Value::key(end_key)) | ||
| .field( |
There was a problem hiding this comment.
Why not .field("region", region)?
There was a problem hiding this comment.
To be consistent with before. Other infomation in Region may not be needed.
src/server/gc_worker/gc_worker.rs
Outdated
| // all regions of thoes keys. | ||
| // We return an iterator which yields items of `Key` and the region taht the key | ||
| // is located. | ||
| fn get_keys_in_regions( |
There was a problem hiding this comment.
This seems a bit over complicated. Why not just use 2 for loops?
for region in regions {
let keys = get_keys_in_region(&keys, region);
// process keys
}
| } | ||
|
|
||
| #[test] | ||
| fn test_stale_read_with_ts0() { |
There was a problem hiding this comment.
If no failpoint is used, then it should be put in integration directory.
|
|
||
| let store_id = 1; | ||
| let mut region = metapb::Region::default(); | ||
| region.set_peers(RepeatedField::from_vec(vec![metapb::Peer { |
There was a problem hiding this comment.
You mean region.mut_peers().push(new_peer(store_id, 0))?
There are still many places using RepeatedField.
components/test_storage/Cargo.toml
Outdated
| futures = "0.3" | ||
| kvproto = { git = "https://github.com/pingcap/kvproto.git" } | ||
| pd_client = { path = "../pd_client", default-features = false } | ||
| protobuf = { version = "2.8", features = ["bytes"] } |
There was a problem hiding this comment.
This dependency should not be added.
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
| let safe_point = safe_point.into(); | ||
| for _ in 0..3 { | ||
| let ret = self.store.gc(self.ctx.clone(), safe_point); | ||
| let ret = self.store.gc(region, self.ctx.clone(), safe_point); |
There was a problem hiding this comment.
Why not just region.clone()? So you don't need to change the other function's definition.
There was a problem hiding this comment.
The region may be changed.
src/server/gc_worker/gc_manager.rs
Outdated
| if region.is_none() { | ||
| return Ok(None); | ||
| }; | ||
| let region = region.unwrap(); |
There was a problem hiding this comment.
| if region.is_none() { | |
| return Ok(None); | |
| }; | |
| let region = region.unwrap(); | |
| let Some(region) = region else { return Ok(None) }; |
src/server/gc_worker/gc_worker.rs
Outdated
| .debug_struct("Gc") | ||
| .field("start_key", &log_wrappers::Value::key(start_key)) | ||
| .field("end_key", &log_wrappers::Value::key(end_key)) | ||
| .field( |
src/server/gc_worker/gc_worker.rs
Outdated
| @@ -247,40 +247,129 @@ struct KeysInRegions<R: Iterator<Item = Region>> { | |||
| } | |||
|
|
|||
| impl<R: Iterator<Item = Region>> Iterator for KeysInRegions<R> { | |||
src/server/gc_worker/gc_worker.rs
Outdated
| fn get_regions_for_gc( | ||
| store_id: u64, | ||
| keys: &Vec<Key>, | ||
| region_or_provider: Either<Region, Arc<dyn RegionInfoProvider>>, |
There was a problem hiding this comment.
It can just take Arc<dyn RegionInfoProvider>. If it's Either::Left, there is no need to call this method.
Signed-off-by: SpadeA-Tang <[email protected]>
src/server/gc_worker/gc_worker.rs
Outdated
| // Return regions that keys are related to. | ||
| fn get_regions_for_gc( | ||
| store_id: u64, | ||
| keys: &Vec<Key>, |
src/server/gc_worker/gc_worker.rs
Outdated
| Ok(regions) | ||
| } else { | ||
| // We only have one key. | ||
| let key = keys.first().unwrap().as_encoded(); |
There was a problem hiding this comment.
Should be just keys[0].as_encoded().
src/server/gc_worker/gc_worker.rs
Outdated
| let end = keys.last().unwrap().as_encoded(); | ||
| let regions = box_try!(region_provider.get_regions_in_range(start, end)) | ||
| .into_iter() | ||
| .filter(move |r| find_peer(r, store_id).is_some()) |
src/server/gc_worker/gc_worker.rs
Outdated
|
|
||
| match box_try!(rx.recv()) { | ||
| Some(region) => Ok(region), | ||
| None => unreachable!(), |
There was a problem hiding this comment.
Why is it unreachable? What if there is no replicas on the node?
src/server/gc_worker/gc_worker.rs
Outdated
| MvccReader::new(snapshot, Some(ScanMode::Forward), false) | ||
| let (mut handled_keys, mut wasted_keys) = (0, 0); | ||
| let mut regions = match region_or_provider { | ||
| Either::Left(region) => vec![region].into_iter().peekable(), |
There was a problem hiding this comment.
You don't need that. Vec<Region>[0] is the same as Vec.into_iter().peekable().peek().
src/server/gc_worker/gc_worker.rs
Outdated
| let (mut handled_keys, mut wasted_keys) = (0, 0); | ||
| // First item is fetched to initialize the reader and kv_engine | ||
| let region = regions.peek(); | ||
| if region.is_none() { |
There was a problem hiding this comment.
regions.is_empty() is more straightforward.
src/server/gc_worker/gc_worker.rs
Outdated
| gc_info.is_completed = true; | ||
| let mut keys = keys.into_iter().peekable(); | ||
| for region in regions { | ||
| if !first_iteration { |
There was a problem hiding this comment.
Why not
for region in regions {
let (reader, kv_engine) = self.create_reader()?;
let txn = Self::new_txn();
xxxx;
Self::flush_txn();
}
When one tablet per region, you can't flush transaction across tablet.
There was a problem hiding this comment.
But it can also have single-rocksdb setting.
There was a problem hiding this comment.
Interesting. In the future, it may need to send the modifies to apply worker by region instead.
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
src/server/gc_worker/gc_worker.rs
Outdated
| let mut gc_info = GcInfo::default(); | ||
| let mut keys = keys.into_iter().peekable(); | ||
| for region in regions { | ||
| for region in regions.into_iter() { |
src/server/gc_worker/gc_worker.rs
Outdated
| let key = keys.peek(); | ||
| if key.is_none() { | ||
| break; | ||
| } | ||
| let key = key.unwrap().as_encoded().as_slice(); |
There was a problem hiding this comment.
| let key = keys.peek(); | |
| if key.is_none() { | |
| break; | |
| } | |
| let key = key.unwrap().as_encoded().as_slice(); | |
| let Some(key) = keys.peek() else { break }; |
| } | ||
| } | ||
|
|
||
| pub fn db(&self) -> Arc<DB> { |
There was a problem hiding this comment.
Not necessary anymore, people can use as_inner.
| let (locks, _) = reader | ||
| .scan_locks(Some(start_key), None, |l| l.ts <= max_ts, limit) | ||
| .map_err(TxnError::from_mvcc)?; | ||
| let regions = box_try!(regions_provider.get_regions_in_range(start_key.as_encoded(), &[])) |
There was a problem hiding this comment.
We need to evaluate the performance impact of changing from one seek to multiple seek.
There was a problem hiding this comment.
As Green GC is going to be deprecated, I think we don't need to concern about it too much.
Maybe it is the time to deprecate it now, since only scanning regions found in region_info_provider may possibly affect the scanning's correctness.
| let snap = self.get_snapshot(self.store_id, ®ion)?; | ||
| let mut reader = MvccReader::new(snap, Some(ScanMode::Forward), false); | ||
| let (locks_this_region, _) = reader | ||
| .scan_locks(Some(&start_key), None, |l| l.ts <= max_ts, limit) |
There was a problem hiding this comment.
/cc @lhy1024 Will it impact the statistics? As there will be many scan ops and end key is always None.
There was a problem hiding this comment.
Scan is also limited by region boundries of snapshot.
There was a problem hiding this comment.
If this pr only affects gc, it has no effect on hotspot statistics, because pd does not process gc-related statistics for the time being
Signed-off-by: SpadeA-Tang <[email protected]>
|
/merge |
|
@MyonKeminta: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
This pull request has been accepted and is ready to merge. DetailsCommit hash: bc9deea |
|
@SpadeA-Tang: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
|
||
| pub fn start_auto_gc<S: GcSafePointProvider, R: RegionInfoProvider + Clone + 'static>( | ||
| &self, | ||
| kv_engine: &E::Local, |
There was a problem hiding this comment.
@SpadeA-Tang Could you help explain a bit why this is necessary?
There was a problem hiding this comment.
Okay... I have remembered why this change exists. I tried to remove the kv_engine() from Engine trait in some point of time but I gave up this way later without changeing it back. I will propose another PR for reverting it.
ref #13319, ref #13322 Signed-off-by: SpadeA-Tang <[email protected]> Co-authored-by: Xinye Tao <[email protected]>
What is changed and how it works?
Issue Number: Close #13319
What's Changed:
In the past, we have some raw APIs in
Enginetrait to avoid breaking hinerrnate regions when doing GC works:snapshot_on_kv_engineandmodify_on_kv_engine.Now, we have supported stale read and so we can use stale read +
start_ts0 to acquire the snapshot which is implemented in this PR. In addition, this PR provider a way to fetch the inner kv engine from the snapshot.Related changes
pingcap/docs/pingcap/docs-cn:Check List
Tests
Side effects
Release note