-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: use and assert non-cancellable Raft scheduler context #73554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👎 on option 3, I think it's an unwise/unsound way to derive a context (see discussion there).
Also not a fan of option 1, since it's high-risk-low-reward.
There's also option 4 which is living with the sub-par assertion.
Option 2 sounds the most reasonable. The scheduler doesn't use the context cancellation. It listens to the stopper:
cockroach/pkg/kv/kvserver/scheduler.go
Lines 191 to 198 in 2ad2bee
| func (s *raftScheduler) Start(ctx context.Context, stopper *stop.Stopper) { | |
| waitQuiesce := func(context.Context) { | |
| <-stopper.ShouldQuiesce() | |
| s.mu.Lock() | |
| s.mu.stopped = true | |
| s.mu.Unlock() | |
| s.mu.cond.Broadcast() | |
| } |
We can pass s.cfg.AmbientContext into newRaftScheduler here instead of ctx:
cockroach/pkg/kv/kvserver/store.go
Line 1117 in 22aff9d
| s.scheduler = newRaftScheduler(s.metrics, s, storeSchedulerConcurrency) |
ctx and I think that should be it?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
andreimatei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👎 on option 3, I think it's an unwise/unsound way to derive a context (see discussion there).
Let's agree on what exactly is unsound. The problems with uncanceledCtx from #66387 I think are not around the context that terminates the cancelation itself, but about the InheritsCancelation(child, parent context.Context) bool utility method, which is not particularly reliable. So having that utility might be a bad idea, but I don't see a problem with uncanceledCtx. I think it is indistinguishable from a hypothetical context that copies everything from the parent but the cancelation. Right?
So I think there's option 5 here too, which is to not assert anything, but to actively terminate the cancelation by deriving an uncanceledCtx.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
ed1bad4 to
4794d2a
Compare
handleRaftReadyRaftMuLocked4794d2a to
d3328df
Compare
erikgrinaker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 2 sounds the most reasonable.
Done, for now.
I don't see a problem with uncanceledCtx. I think it is indistinguishable from a hypothetical context that copies everything from the parent but the cancelation. Right?
I tend to agree with this.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
I think there are unspoken rules about implementing
Besides, the purist in me just doesn't want that to be a thing because there is already an idiomatic way, which is deriving from a context that has the wider cancellation you need. I worry that playing fast and loose with Not relevant here, but another unspoken rule is that you can never wholesale discard the values stored in a context, i.e. I don't think the inventors of |
`handleRaftReadyRaftMuLocked` is not prepared to handle context cancellation. It is typically called via the Raft scheduler, which uses a background context, but can be called via other paths as well (e.g. snapshot application). This patch adds an assertion that the given context is not cancellable, and creates a new background context for the main scheduler code path instead of using the CLI's cancellable context. Release note: None
d3328df to
586eeb9
Compare
|
I get your point @tbg, and to a large extent I think you're right. I considered TFTR! bors r=tbg |
|
Build succeeded: |
handleRaftReadyRaftMuLockedis not prepared to handle contextcancellation. It is typically called via the Raft scheduler, which uses
a background context, but can be called via other paths as well (e.g.
snapshot application).
This patch adds an assertion that the given context is not cancellable,
and creates a new background context for the main scheduler code path
instead of using the CLI's cancellable context.
Release note: None
Split off from #73484, see previous discussion there.
Turns out that this fails because the Raft scheduler context is in fact cancellable. It's rooted at the CLI context:
cockroach/pkg/cli/start.go
Lines 407 to 408 in c3e8d85
There's a few different options here, including:
handleRaftReadyto to handle context cancellation safely.contextutil.WithoutCancel()from util/stop: a better async task interface #66387 merged, and use it.