-
Notifications
You must be signed in to change notification settings - Fork 4.1k
tracing: use manual collection in ForkCtxSpan #59815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: use manual collection in ForkCtxSpan #59815
Conversation
f346302 to
7b9a022
Compare
irfansharif
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @knz, and @tbg)
pkg/util/tracing/tracer.go, line 629 at r1 (raw file):
// the tracer to include recordings from forked/async child spans, when // retrieving the recording for a parent span. func (t *Tracer) TestingIncludeAsyncSpansInRecordings() {
I tried introducing an interceptor instead (suggested in #59391 (review)), and asserting on span recordings on Finish, but it felt too clunky. For tests that want to "capture everything", think this is much easier to work with. It's also only just a handful of tests that want to do this today (the logic tests don't).
pkg/util/tracing/tracer.go, line 650 at r1 (raw file):
return ctx, nil } collectionOpt := WithParentAndManualCollection(sp.Meta())
Didn't flip the ordering as suggested in #59391 (review), the Manual collection will be the one used in everything except for a select few tests, so optimizing for the memory usage seems unnecessary.
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the rabbit hole is going a little deeper still, see the dist sender comments.
Reviewed 22 of 22 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @irfansharif, and @knz)
pkg/sql/opt/exec/execbuilder/testdata/select, line 1763 at r1 (raw file):
---- querying next range at /Table/73/1/0 === SPAN START: kv.DistSender: sending partial batch ===
Unfortunately, DistSender relies on async tasks in more than one place, these are the ones I found (might've missed some, you get the idea):
cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go
Lines 1370 to 1379 in cdd48a0
| if err := ds.rpcContext.Stopper.RunLimitedAsyncTask( | |
| ctx, "kv.DistSender: sending partial batch", | |
| ds.asyncSenderSem, false, /* wait */ | |
| func(ctx context.Context) { | |
| ds.metrics.AsyncSentCount.Inc(1) | |
| responseCh <- ds.sendPartialBatch( | |
| ctx, ba, rs, routing, withCommit, batchIdx, true, /* needsTruncate */ | |
| ) | |
| }, | |
| ); err != nil { |
cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go
Lines 993 to 1001 in cdd48a0
| // Both halves of the split batch succeeded. Piece them back together. | |
| resps := make([]roachpb.ResponseUnion, len(swappedReqs)) | |
| copy(resps, br.Responses) | |
| resps[swapIdx], resps[lastIdx] = resps[lastIdx], resps[swapIdx] | |
| br.Responses = resps | |
| if err := br.Combine(qiReply.reply, qiReply.positions); err != nil { | |
| return nil, roachpb.NewError(err) | |
| } | |
| return br, nil |
The batch responses are combined here:
cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go
Lines 1221 to 1226 in cdd48a0
| // Combine the new response with the existing one (including updating | |
| // the headers) if we haven't yet seen an error. | |
| if pErr == nil { | |
| if err := br.Combine(resp.reply, resp.positions); err != nil { | |
| pErr = roachpb.NewError(err) | |
| } |
At the very least, we need to make sure that BatchResponse.Combine appends the spans; otherwise we're losing lots of spans from KV and the whole tracing thing is doomed. It looks like our testing of actual trace histories' accuracy is not very thorough (or rather, does not exist), which we need to fix and might as well now. I think we can set up a test cluster, ten scratch ranges, and a replica eval interceptor that looks at the key for all of the test's puts on each range and emits a metadata containing the key to the trace. The test can then verify that for every put we see the right meta in this range. I think it's actually straightforward to sue kvnemesis for this? We execute all regular KV ops in this method:
cockroach/pkg/kv/kvnemesis/applier.go
Lines 135 to 140 in c955e88
| func applyClientOp(ctx context.Context, db clientI, op *Operation) { | |
| switch o := op.GetValue().(type) { | |
| case *GetOperation: | |
| fn := db.Get | |
| if o.ForUpdate { | |
| fn = db.GetForUpdate |
Each of these should verify that they keys it accesses are reflected in metas that we set up kvnemesis to inject via an eval interceptor. I'm still on the fence whether this is a good first step, randomized testing is great and everything, but it's also good to have a basic manual test of these things that is easier to debug, so maybe that's the better starting point, especially since we expect things to be broken as of this PR.
Not necessarily for now, but something we also need to do:
I think there is also an issue with tracking Errors that get retried away internally. These are detached from the batch response here:
cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go
Lines 1559 to 1561 in cdd48a0
| // Untangle the error from the received response. | |
| pErr = reply.Error | |
| reply.Error = nil // scrub the response error |
We should make sure they implement Structured (do so trivially now) and added to the trace. There's also the question on how to track this error when it occurs:
cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go
Lines 1547 to 1549 in cdd48a0
| log.VEventf(ctx, 1, "evicting range desc %s after %s", routingTok, err) | |
| routingTok.Evict(ctx) | |
| continue |
pkg/util/tracing/tracer.go, line 753 at r1 (raw file):
// an existing Tracer is unavailable. Tests should typically expect to use // ContextWithRecordingSpanUsing instead. func ContextWithRecordingSpan(
Can you nuke this method? All callers in test can just ContextWithRecordingSpanUsing(ctx, NewTracer(), opName) and the ~3 callers in prod code should have a tracer they can grab.
Switch to the 5node family of clusters to test against (more realistic), and also make the query that collects the event touch multiple ranges, which make it "easier" to miss an event. cc @irfansharif - this could help with the discussion about DistSender in [1]. [1]: cockroachdb#59815 Release note: None
59876: tracing: improve interfaces around process boundaries r=irfansharif a=irfansharif We can rid ourselves of the opentracing cruft just a bit more, and add some type-safety while doing so. While here, we'll also add some text explaining how tracing data gets propagated across process boundaries. Release note: None --- First commit is from #59815, ignore here. Co-authored-by: irfan sharif <[email protected]>
5f07c57 to
db4abbb
Compare
irfansharif
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @knz, and @tbg)
pkg/sql/opt/exec/execbuilder/testdata/select, line 1763 at r1 (raw file):
At the very least, we need to make sure that BatchResponse.Combine appends the spans; otherwise we're losing lots of spans from KV and the whole tracing thing is doomed
We're already doing that here:
Line 535 in 9e31dd8
| h.CollectedSpans = append(h.CollectedSpans, o.CollectedSpans...) |
But I agree that our testing story here leaves much to be desired. I think that's well deserving of a PR of its own. I need to play around what what that looks like; I'd also prefer being able to isolate and test trace data in unit tests.
So if I'm reading the above right, given we're already combining spans from batch responses (including those captured using async tasks), does that unblock this PR? All this trace data is later imported into the span at the gateway, which is what we ultimately want (though, yes, would be nice to add tests for it).
pkg/util/tracing/tracer.go, line 753 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Can you nuke this method? All callers in test can just
ContextWithRecordingSpanUsing(ctx, NewTracer(), opName)and the ~3 callers in prod code should have a tracer they can grab.
Done.
angelapwen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per @tbg 😸
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @knz and @tbg)
db4abbb to
fd87ba2
Compare
|
Thanks! bors r+ |
|
Build failed: |
The spans created by ForkCtxSpan are expected to outlive the parent, so automatic collection makes little sense here. It also causes the child to be linked into the parent, which can delay memory reclamation; some parent spans are effectively ephemeral. As an example of this, consider the usage in `replicaDecoder`. The parent context has a long-lived span, and forks off new tracing spans for each decoded command. Similar problems occur prop up for the `raft-worker`/`raftScheduler` goroutines. --- For tests, we make two changes: - Logic tests that asserted on async recordings are simply removed, we're making a backwards-incompatible change here - Unit tests that want to assert on trace data from async spans can now configure the tracer to include them explicitly While here, we introduce `tracing.ContextWithRecordingSpanUsing` to capture the tracer used to construct spans, so most callers can provide the one available. Release note (general change): `SHOW TRACE FOR SESSION` previously included cockroach internal traces for async threads kicked off as part of user operations. This trace data is no longer captured. Co-authored-by: Tobias Grieger <[email protected]>
fd87ba2 to
a1c3912
Compare
|
Skewed with #59760. bors r+ |
Switch to the 5node family of clusters to test against (more realistic), and also make the query that collects the event touch multiple ranges, which make it "easier" to miss an event. cc @irfansharif - this could help with the discussion about DistSender in [1]. [1]: cockroachdb#59815 Release note: None
Switch to the 5node family of clusters to test against (more realistic), and also make the query that collects the event touch multiple ranges, which make it "easier" to miss an event. cc @irfansharif - this could help with the discussion about DistSender in [1]. [1]: cockroachdb#59815 Release note: None
Switch to the 5node family of clusters to test against (more realistic), and also make the query that collects the event touch multiple ranges, which make it "easier" to miss an event. cc @irfansharif - this could help with the discussion about DistSender in [1]. [1]: cockroachdb#59815 Release note: None
Switch to the 5node family of clusters to test against (more realistic), and also make the query that collects the event touch multiple ranges, which make it "easier" to miss an event. cc @irfansharif - this could help with the discussion about DistSender in [1]. [1]: cockroachdb#59815 Release note: None
Before this patch, since fairly recently, the spans created for Stopper.RunAsyncTask[Ex]() were not included in the recording of the caller's span because these spans were created with the ForkSpan(). This patch adds an option to RunAsyncEx to make the task's span a child of the caller's, and thus to be included in the caller's recording. This option is used by the DistSender when sending partial batch requests, therefore re-including these requests in higher-level traces (as they used to be). The reason why async tasks were detached from the caller's span in cockroachdb#59815 was because of concerns about parent and children spans with very different lifetimes becoming tied through a trace, delaying the garbage collection of the whole trace until the last span in it finishes. This patch gives the caller of RunAsyncTaskEx control over this behavior; in the particular case of the DistSender, the traces are not too long lived and the DistSender waits for the async tasks it spawns to finish. In such circumstances, tying the allocations of the child spans to the whole trace should be fine. Release note (general change): A recent release removed parts of some queries from the debugging traces of those queries. This information (i.e. the execution of some low-level RPCs) has been re-included in the traces.
Before this patch, since fairly recently, the spans created for Stopper.RunAsyncTask[Ex]() were not included in the recording of the caller's span because these spans were created with the ForkSpan(). This patch adds an option to RunAsyncEx to make the task's span a child of the caller's, and thus to be included in the caller's recording. This option is used by the DistSender when sending partial batch requests, therefore re-including these requests in higher-level traces (as they used to be). The reason why async tasks were detached from the caller's span in cockroachdb#59815 was because of concerns about parent and children spans with very different lifetimes becoming tied through a trace, delaying the garbage collection of the whole trace until the last span in it finishes. This patch gives the caller of RunAsyncTaskEx control over this behavior; in the particular case of the DistSender, the traces are not too long lived and the DistSender waits for the async tasks it spawns to finish. In such circumstances, tying the allocations of the child spans to the whole trace should be fine. Release note (general change): A recent release removed parts of some queries from the debugging traces of those queries. This information (i.e. the execution of some low-level RPCs) has been re-included in the traces.
Before this patch, since fairly recently, the spans created for Stopper.RunAsyncTask[Ex]() were not included in the recording of the caller's span because these spans were created with the ForkSpan(). This patch adds an option to RunAsyncEx to make the task's span a child of the caller's, and thus to be included in the caller's recording. This option is used by the DistSender when sending partial batch requests, therefore re-including these requests in higher-level traces (as they used to be). The reason why async tasks were detached from the caller's span in cockroachdb#59815 was because of concerns about parent and children spans with very different lifetimes becoming tied through a trace, delaying the garbage collection of the whole trace until the last span in it finishes. This patch gives the caller of RunAsyncTaskEx control over this behavior; in the particular case of the DistSender, the traces are not too long lived and the DistSender waits for the async tasks it spawns to finish. In such circumstances, tying the allocations of the child spans to the whole trace should be fine. Release note (general change): A recent release removed parts of some queries from the debugging traces of those queries. This information (i.e. the execution of some low-level RPCs) has been re-included in the traces.
Before this patch, since fairly recently, the spans created for Stopper.RunAsyncTask[Ex]() were not included in the recording of the caller's span because these spans were created with the ForkSpan(). This patch adds an option to RunAsyncEx to make the task's span a child of the caller's, and thus to be included in the caller's recording. This option is used by the DistSender when sending partial batch requests, therefore re-including these requests in higher-level traces (as they used to be). The reason why async tasks were detached from the caller's span in cockroachdb#59815 was because of concerns about parent and children spans with very different lifetimes becoming tied through a trace, delaying the garbage collection of the whole trace until the last span in it finishes. This patch gives the caller of RunAsyncTaskEx control over this behavior; in the particular case of the DistSender, the traces are not too long lived and the DistSender waits for the async tasks it spawns to finish. In such circumstances, tying the allocations of the child spans to the whole trace should be fine. Release note (general change): A recent release removed parts of some queries from the debugging traces of those queries. This information (i.e. the execution of some low-level RPCs) has been re-included in the traces.
67713: stop, kvclient: include DistSender partial batches in traces r=andreimatei a=andreimatei These are the first few patches from #66387 - extracting the non-contentious prefix dealing with DistSender tracing. ---- Before this patch, since fairly recently, the spans created for Stopper.RunAsyncTask[Ex]() were not included in the recording of the caller's span because these spans were created with the ForkSpan(). This patch adds an option to RunAsyncEx to make the task's span a child of the caller's, and thus to be included in the caller's recording. This option is used by the DistSender when sending partial batch requests, therefore re-including these requests in higher-level traces (as they used to be). The reason why async tasks were detached from the caller's span in #59815 was because of concerns about parent and children spans with very different lifetimes becoming tied through a trace, delaying the garbage collection of the whole trace until the last span in it finishes. This patch gives the caller of RunAsyncTaskEx control over this behavior; in the particular case of the DistSender, the traces are not too long lived and the DistSender waits for the async tasks it spawns to finish. In such circumstances, tying the allocations of the child spans to the whole trace should be fine. Release note (general change): A recent release removed parts of some queries from the debugging traces of those queries. This information (i.e. the execution of some low-level RPCs) has been re-included in the traces. Co-authored-by: Andrei Matei <[email protected]>
Before this patch, since fairly recently, the spans created for Stopper.RunAsyncTask[Ex]() were not included in the recording of the caller's span because these spans were created with the ForkSpan(). This patch adds an option to RunAsyncEx to make the task's span a child of the caller's, and thus to be included in the caller's recording. This option is used by the DistSender when sending partial batch requests, therefore re-including these requests in higher-level traces (as they used to be). The reason why async tasks were detached from the caller's span in cockroachdb#59815 was because of concerns about parent and children spans with very different lifetimes becoming tied through a trace, delaying the garbage collection of the whole trace until the last span in it finishes. This patch gives the caller of RunAsyncTaskEx control over this behavior; in the particular case of the DistSender, the traces are not too long lived and the DistSender waits for the async tasks it spawns to finish. In such circumstances, tying the allocations of the child spans to the whole trace should be fine. Release note (general change): A recent release removed parts of some queries from the debugging traces of those queries. This information (i.e. the execution of some low-level RPCs) has been re-included in the traces.
Before this patch, the decoding and application of a Raft command was not included in the recording of the request that generated the respective command, even in the case where consensus is synchronous with respect to the request (i.e. non-AsyncConsensus requests). This was because, although we plumb tracing information below Raft, the span under which Raft processing was occurring was Fork()ed from the parent span (i.e. the request evaluation's span). The reasons why that was are not good: 1) forking (as opposed to creating a regular child) was introduced in cockroachdb#39425. I'm not sure whether there was a particular reason for this decision. Perhaps there was fear at the time about the child outliving the parent - although generally it doesn't because, in the case of async consensus, we fork a parent which I think will outlive the child: https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157 In case of sync consensus requests, it's possible for the Raft application span to outlive the evaluation span in cases when the application part (as opposed to the consensus part) can be done async (see CanAckBeforeApplication). Still, regardless of the exact details of the lifetimes, with time it has become clear that it's appropriate to create a child when it's expected that it will usually not outlive the parent even if, on occasion, it can outlive it. 2) forked spans used to be included in the parent's recording until cockroachdb#59815. This explains why we regressed here - Raft application used to be included in recordings properly. This patch makes the application span be a regular child, and so the raft application span is included in the request trace. Touches cockroachdb#70864. That issue asks for a new tracing feature but I think what motivated it is the application info missing from query traces. This patch is sufficient for that. Release note (general change): Tracing transaction commits now includes details about replication.
Before this patch, the decoding and application of a Raft command was not included in the recording of the request that generated the respective command, even in the case where consensus is synchronous with respect to the request (i.e. non-AsyncConsensus requests). This was because, although we plumb tracing information below Raft, the span under which Raft processing was occurring was Fork()ed from the parent span (i.e. the request evaluation's span). The reasons why that was are not good: 1) forking (as opposed to creating a regular child) was introduced in cockroachdb#39425. I'm not sure whether there was a particular reason for this decision. Perhaps there was fear at the time about the child outliving the parent - although generally it doesn't because, in the case of async consensus, we fork a parent which I think will outlive the child: https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157 In case of sync consensus requests, it's possible for the Raft application span to outlive the evaluation span in cases when the application part (as opposed to the consensus part) can be done async (see CanAckBeforeApplication). Still, regardless of the exact details of the lifetimes, with time it has become clear that it's appropriate to create a child when it's expected that it will usually not outlive the parent even if, on occasion, it can outlive it. 2) forked spans used to be included in the parent's recording until cockroachdb#59815. This explains why we regressed here - Raft application used to be included in recordings properly. This patch makes the application span be a regular child, and so the raft application span is included in the request trace. Touches cockroachdb#70864. That issue asks for a new tracing feature but I think what motivated it is the application info missing from query traces. This patch is sufficient for that. Release note (general change): Tracing transaction commits now includes details about replication.
72642: democluster: improve logging/tracing r=otan a=knz All commits but the last 3 from #72644 (Reviewers: only review last 3 commits) Informs #58938 72727: tracing: misc cleanups r=andreimatei a=andreimatei See individual commits. 72738: kvserver: include Raft application on leaseholder in request trace r=andreimatei a=andreimatei Before this patch, the decoding and application of a Raft command was not included in the recording of the request that generated the respective command, even in the case where consensus is synchronous with respect to the request (i.e. non-AsyncConsensus requests). This was because, although we plumb tracing information below Raft, the span under which Raft processing was occurring was Fork()ed from the parent span (i.e. the request evaluation's span). The reasons why that was are not good: 1) forking (as opposed to creating a regular child) was introduced in #39425. I'm not sure whether there was a particular reason for this decision. Perhaps there was fear at the time about the child outliving the parent - although I don't think that it does because, in the case of async consensus, we fork a parent which I think will outlive the child: https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157 Regardless of the exact details of the lifetimes, with time it has become clear that it's appropriate to create a child when it's expected that it will usually not outlive the parent even if, on occasion, it can outlive it. 2) forked spans used to be included in the parent's recording until #59815. This explains why we regressed here - Raft application used to be included in recordings properly. This patch makes the application span be a regular child, and so the raft application span is included in the request trace. Touches #70864. That issue asks for a new tracing feature but I think what motivated it is the application info missing from query traces. This patch is sufficient for that. Release note (general change): Tracing transaction commits now includes details about replication. 72896: server: don't mix Tracers in test tenant r=andreimatei a=andreimatei Before this patch, TestServer.StartTenant() would share the Stopper between the server and the tenant, and it would also proceed to give the Tenant a new Tracer. The stopper in question has the server's Tracer in it, and so this sharing is no bueno, because the tenant ends up having two different Tracers, and it uses either of them fairly arbitrarily at different points. Intermingling two Tracers like this was never intended, and attempting to create a child with a different tracer then the parent will become illegal shortly. This shows that sharing stoppers between different "server" is a bad idea. This patch stops the practice. The patch also removes the tracer config options for test tenants because it was unused and supporting it in combination with the stopper configuration would require more sanity checks. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Andrei Matei <[email protected]>
The spans created by ForkCtxSpan are expected to outlive the parent, so
automatic collection makes little sense here. It also causes the child
to be linked into the parent, which can delay memory reclamation; some
parent spans are effectively ephemeral.
As an example of this, consider the usage in
replicaDecoder. Theparent context has a long-lived span, and forks off new tracing spans
for each decoded command. Similar problems occur prop up for the
raft-worker/raftSchedulergoroutines.For tests, we make two changes:
we're making a backwards-incompatible change here
now configure the tracer to include them explicitly
While here, we introduce
tracing.ContextWithRecordingSpanUsingtocapture the tracer used to construct spans, so most callers can provide
the one available.
Release note (general change):
SHOW TRACE FOR SESSIONpreviouslyincluded cockroach internal traces for async threads kicked off as part
of user operations. This trace data is no longer captured.