tracing: use manual collection in ForkCtxSpan #59815

irfansharif · 2021-02-04T19:09:36Z

The spans created by ForkCtxSpan are expected to outlive the parent, so
automatic collection makes little sense here. It also causes the child
to be linked into the parent, which can delay memory reclamation; some
parent spans are effectively ephemeral.

As an example of this, consider the usage in replicaDecoder. The
parent context has a long-lived span, and forks off new tracing spans
for each decoded command. Similar problems occur prop up for the
raft-worker/raftScheduler goroutines.

For tests, we make two changes:

Logic tests that asserted on async recordings are simply removed,
we're making a backwards-incompatible change here
Unit tests that want to assert on trace data from async spans can
now configure the tracer to include them explicitly

While here, we introduce tracing.ContextWithRecordingSpanUsing to
capture the tracer used to construct spans, so most callers can provide
the one available.

Release note (general change): SHOW TRACE FOR SESSION previously
included cockroach internal traces for async threads kicked off as part
of user operations. This trace data is no longer captured.

cockroach-teamcity · 2021-02-04T19:09:43Z

This change is

irfansharif

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @knz, and @tbg)

pkg/util/tracing/tracer.go, line 629 at r1 (raw file):

// the tracer to include recordings from forked/async child spans, when
// retrieving the recording for a parent span.
func (t *Tracer) TestingIncludeAsyncSpansInRecordings() {

I tried introducing an interceptor instead (suggested in #59391 (review)), and asserting on span recordings on Finish, but it felt too clunky. For tests that want to "capture everything", think this is much easier to work with. It's also only just a handful of tests that want to do this today (the logic tests don't).

pkg/util/tracing/tracer.go, line 650 at r1 (raw file):

		return ctx, nil
	}
	collectionOpt := WithParentAndManualCollection(sp.Meta())

Didn't flip the ordering as suggested in #59391 (review), the Manual collection will be the one used in everything except for a select few tests, so optimizing for the memory usage seems unnecessary.

tbg

Looks like the rabbit hole is going a little deeper still, see the dist sender comments.

Reviewed 22 of 22 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @irfansharif, and @knz)

pkg/sql/opt/exec/execbuilder/testdata/select, line 1763 at r1 (raw file):

----
querying next range at /Table/73/1/0
=== SPAN START: kv.DistSender: sending partial batch ===

Unfortunately, DistSender relies on async tasks in more than one place, these are the ones I found (might've missed some, you get the idea):

cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go

Lines 1370 to 1379 in cdd48a0

    
           if err := ds.rpcContext.Stopper.RunLimitedAsyncTask( 
        
           	ctx, "kv.DistSender: sending partial batch", 
        
           	ds.asyncSenderSem, false, /* wait */ 
        
           	func(ctx context.Context) { 
        
           		ds.metrics.AsyncSentCount.Inc(1) 
        
           		responseCh <- ds.sendPartialBatch( 
        
           			ctx, ba, rs, routing, withCommit, batchIdx, true, /* needsTruncate */ 
        
           		) 
        
           	}, 
        
           ); err != nil {

cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go

Lines 993 to 1001 in cdd48a0

    
           // Both halves of the split batch succeeded. Piece them back together. 
        
           resps := make([]roachpb.ResponseUnion, len(swappedReqs)) 
        
           copy(resps, br.Responses) 
        
           resps[swapIdx], resps[lastIdx] = resps[lastIdx], resps[swapIdx] 
        
           br.Responses = resps 
        
           if err := br.Combine(qiReply.reply, qiReply.positions); err != nil { 
        
           	return nil, roachpb.NewError(err) 
        
           } 
        
           return br, nil

The batch responses are combined here:

cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go

Lines 1221 to 1226 in cdd48a0

    
           // Combine the new response with the existing one (including updating 
        
           // the headers) if we haven't yet seen an error. 
        
           if pErr == nil { 
        
           	if err := br.Combine(resp.reply, resp.positions); err != nil { 
        
           		pErr = roachpb.NewError(err) 
        
           	}

At the very least, we need to make sure that BatchResponse.Combine appends the spans; otherwise we're losing lots of spans from KV and the whole tracing thing is doomed. It looks like our testing of actual trace histories' accuracy is not very thorough (or rather, does not exist), which we need to fix and might as well now. I think we can set up a test cluster, ten scratch ranges, and a replica eval interceptor that looks at the key for all of the test's puts on each range and emits a metadata containing the key to the trace. The test can then verify that for every put we see the right meta in this range. I think it's actually straightforward to sue kvnemesis for this? We execute all regular KV ops in this method:

cockroach/pkg/kv/kvnemesis/applier.go

Lines 135 to 140 in c955e88

    
           func applyClientOp(ctx context.Context, db clientI, op *Operation) { 
        
           	switch o := op.GetValue().(type) { 
        
           	case *GetOperation: 
        
           		fn := db.Get 
        
           		if o.ForUpdate { 
        
           			fn = db.GetForUpdate

Each of these should verify that they keys it accesses are reflected in metas that we set up kvnemesis to inject via an eval interceptor. I'm still on the fence whether this is a good first step, randomized testing is great and everything, but it's also good to have a basic manual test of these things that is easier to debug, so maybe that's the better starting point, especially since we expect things to be broken as of this PR.

Not necessarily for now, but something we also need to do:

I think there is also an issue with tracking Errors that get retried away internally. These are detached from the batch response here:

cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go

Lines 1559 to 1561 in cdd48a0

    
           // Untangle the error from the received response. 
        
           pErr = reply.Error 
        
           reply.Error = nil // scrub the response error

We should make sure they implement Structured (do so trivially now) and added to the trace. There's also the question on how to track this error when it occurs:

cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go

Lines 1547 to 1549 in cdd48a0

    
           log.VEventf(ctx, 1, "evicting range desc %s after %s", routingTok, err) 
        
           routingTok.Evict(ctx) 
        
           continue

pkg/util/tracing/tracer.go, line 753 at r1 (raw file):

// an existing Tracer is unavailable. Tests should typically expect to use
// ContextWithRecordingSpanUsing instead.
func ContextWithRecordingSpan(

Can you nuke this method? All callers in test can just ContextWithRecordingSpanUsing(ctx, NewTracer(), opName) and the ~3 callers in prod code should have a tracer they can grab.

@irfansharif

Switch to the 5node family of clusters to test against (more realistic), and also make the query that collects the event touch multiple ranges, which make it "easier" to miss an event. cc @irfansharif - this could help with the discussion about DistSender in [1]. [1]: cockroachdb#59815 Release note: None

59876: tracing: improve interfaces around process boundaries r=irfansharif a=irfansharif We can rid ourselves of the opentracing cruft just a bit more, and add some type-safety while doing so. While here, we'll also add some text explaining how tracing data gets propagated across process boundaries. Release note: None --- First commit is from #59815, ignore here. Co-authored-by: irfan sharif <[email protected]>

irfansharif

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @knz, and @tbg)

pkg/sql/opt/exec/execbuilder/testdata/select, line 1763 at r1 (raw file):

At the very least, we need to make sure that BatchResponse.Combine appends the spans; otherwise we're losing lots of spans from KV and the whole tracing thing is doomed

We're already doing that here:

cockroach/pkg/roachpb/api.go

Line 535 in 9e31dd8

h.CollectedSpans = append(h.CollectedSpans, o.CollectedSpans...)

But I agree that our testing story here leaves much to be desired. I think that's well deserving of a PR of its own. I need to play around what what that looks like; I'd also prefer being able to isolate and test trace data in unit tests.

So if I'm reading the above right, given we're already combining spans from batch responses (including those captured using async tasks), does that unblock this PR? All this trace data is later imported into the span at the gateway, which is what we ultimately want (though, yes, would be nice to add tests for it).

pkg/util/tracing/tracer.go, line 753 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Can you nuke this method? All callers in test can just ContextWithRecordingSpanUsing(ctx, NewTracer(), opName) and the ~3 callers in prod code should have a tracer they can grab.

Done.

angelapwen

per @tbg 😸

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz and @tbg)

irfansharif · 2021-02-10T16:32:04Z

Thanks!

bors r+

craig · 2021-02-10T17:09:31Z

Build failed:

GitHub CI (Cockroach)

The spans created by ForkCtxSpan are expected to outlive the parent, so automatic collection makes little sense here. It also causes the child to be linked into the parent, which can delay memory reclamation; some parent spans are effectively ephemeral. As an example of this, consider the usage in `replicaDecoder`. The parent context has a long-lived span, and forks off new tracing spans for each decoded command. Similar problems occur prop up for the `raft-worker`/`raftScheduler` goroutines. --- For tests, we make two changes: - Logic tests that asserted on async recordings are simply removed, we're making a backwards-incompatible change here - Unit tests that want to assert on trace data from async spans can now configure the tracer to include them explicitly While here, we introduce `tracing.ContextWithRecordingSpanUsing` to capture the tracer used to construct spans, so most callers can provide the one available. Release note (general change): `SHOW TRACE FOR SESSION` previously included cockroach internal traces for async threads kicked off as part of user operations. This trace data is no longer captured. Co-authored-by: Tobias Grieger <[email protected]>

irfansharif · 2021-02-10T17:16:47Z

Skewed with #59760.

bors r+

@irfansharif

Switch to the 5node family of clusters to test against (more realistic), and also make the query that collects the event touch multiple ranges, which make it "easier" to miss an event. cc @irfansharif - this could help with the discussion about DistSender in [1]. [1]: cockroachdb#59815 Release note: None

@irfansharif

Switch to the 5node family of clusters to test against (more realistic), and also make the query that collects the event touch multiple ranges, which make it "easier" to miss an event. cc @irfansharif - this could help with the discussion about DistSender in [1]. [1]: cockroachdb#59815 Release note: None

@irfansharif

Switch to the 5node family of clusters to test against (more realistic), and also make the query that collects the event touch multiple ranges, which make it "easier" to miss an event. cc @irfansharif - this could help with the discussion about DistSender in [1]. [1]: cockroachdb#59815 Release note: None

@irfansharif

Switch to the 5node family of clusters to test against (more realistic), and also make the query that collects the event touch multiple ranges, which make it "easier" to miss an event. cc @irfansharif - this could help with the discussion about DistSender in [1]. [1]: cockroachdb#59815 Release note: None

Before this patch, since fairly recently, the spans created for Stopper.RunAsyncTask[Ex]() were not included in the recording of the caller's span because these spans were created with the ForkSpan(). This patch adds an option to RunAsyncEx to make the task's span a child of the caller's, and thus to be included in the caller's recording. This option is used by the DistSender when sending partial batch requests, therefore re-including these requests in higher-level traces (as they used to be). The reason why async tasks were detached from the caller's span in cockroachdb#59815 was because of concerns about parent and children spans with very different lifetimes becoming tied through a trace, delaying the garbage collection of the whole trace until the last span in it finishes. This patch gives the caller of RunAsyncTaskEx control over this behavior; in the particular case of the DistSender, the traces are not too long lived and the DistSender waits for the async tasks it spawns to finish. In such circumstances, tying the allocations of the child spans to the whole trace should be fine. Release note (general change): A recent release removed parts of some queries from the debugging traces of those queries. This information (i.e. the execution of some low-level RPCs) has been re-included in the traces.

67713: stop, kvclient: include DistSender partial batches in traces r=andreimatei a=andreimatei These are the first few patches from #66387 - extracting the non-contentious prefix dealing with DistSender tracing. ---- Before this patch, since fairly recently, the spans created for Stopper.RunAsyncTask[Ex]() were not included in the recording of the caller's span because these spans were created with the ForkSpan(). This patch adds an option to RunAsyncEx to make the task's span a child of the caller's, and thus to be included in the caller's recording. This option is used by the DistSender when sending partial batch requests, therefore re-including these requests in higher-level traces (as they used to be). The reason why async tasks were detached from the caller's span in #59815 was because of concerns about parent and children spans with very different lifetimes becoming tied through a trace, delaying the garbage collection of the whole trace until the last span in it finishes. This patch gives the caller of RunAsyncTaskEx control over this behavior; in the particular case of the DistSender, the traces are not too long lived and the DistSender waits for the async tasks it spawns to finish. In such circumstances, tying the allocations of the child spans to the whole trace should be fine. Release note (general change): A recent release removed parts of some queries from the debugging traces of those queries. This information (i.e. the execution of some low-level RPCs) has been re-included in the traces. Co-authored-by: Andrei Matei <[email protected]>

Before this patch, since fairly recently, the spans created for Stopper.RunAsyncTask[Ex]() were not included in the recording of the caller's span because these spans were created with the ForkSpan(). This patch adds an option to RunAsyncEx to make the task's span a child of the caller's, and thus to be included in the caller's recording. This option is used by the DistSender when sending partial batch requests, therefore re-including these requests in higher-level traces (as they used to be). The reason why async tasks were detached from the caller's span in cockroachdb#59815 was because of concerns about parent and children spans with very different lifetimes becoming tied through a trace, delaying the garbage collection of the whole trace until the last span in it finishes. This patch gives the caller of RunAsyncTaskEx control over this behavior; in the particular case of the DistSender, the traces are not too long lived and the DistSender waits for the async tasks it spawns to finish. In such circumstances, tying the allocations of the child spans to the whole trace should be fine. Release note (general change): A recent release removed parts of some queries from the debugging traces of those queries. This information (i.e. the execution of some low-level RPCs) has been re-included in the traces.

Before this patch, the decoding and application of a Raft command was not included in the recording of the request that generated the respective command, even in the case where consensus is synchronous with respect to the request (i.e. non-AsyncConsensus requests). This was because, although we plumb tracing information below Raft, the span under which Raft processing was occurring was Fork()ed from the parent span (i.e. the request evaluation's span). The reasons why that was are not good: 1) forking (as opposed to creating a regular child) was introduced in cockroachdb#39425. I'm not sure whether there was a particular reason for this decision. Perhaps there was fear at the time about the child outliving the parent - although generally it doesn't because, in the case of async consensus, we fork a parent which I think will outlive the child: https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157 In case of sync consensus requests, it's possible for the Raft application span to outlive the evaluation span in cases when the application part (as opposed to the consensus part) can be done async (see CanAckBeforeApplication). Still, regardless of the exact details of the lifetimes, with time it has become clear that it's appropriate to create a child when it's expected that it will usually not outlive the parent even if, on occasion, it can outlive it. 2) forked spans used to be included in the parent's recording until cockroachdb#59815. This explains why we regressed here - Raft application used to be included in recordings properly. This patch makes the application span be a regular child, and so the raft application span is included in the request trace. Touches cockroachdb#70864. That issue asks for a new tracing feature but I think what motivated it is the application info missing from query traces. This patch is sufficient for that. Release note (general change): Tracing transaction commits now includes details about replication.

72642: democluster: improve logging/tracing r=otan a=knz All commits but the last 3 from #72644 (Reviewers: only review last 3 commits) Informs #58938 72727: tracing: misc cleanups r=andreimatei a=andreimatei See individual commits. 72738: kvserver: include Raft application on leaseholder in request trace r=andreimatei a=andreimatei Before this patch, the decoding and application of a Raft command was not included in the recording of the request that generated the respective command, even in the case where consensus is synchronous with respect to the request (i.e. non-AsyncConsensus requests). This was because, although we plumb tracing information below Raft, the span under which Raft processing was occurring was Fork()ed from the parent span (i.e. the request evaluation's span). The reasons why that was are not good: 1) forking (as opposed to creating a regular child) was introduced in #39425. I'm not sure whether there was a particular reason for this decision. Perhaps there was fear at the time about the child outliving the parent - although I don't think that it does because, in the case of async consensus, we fork a parent which I think will outlive the child: https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157 Regardless of the exact details of the lifetimes, with time it has become clear that it's appropriate to create a child when it's expected that it will usually not outlive the parent even if, on occasion, it can outlive it. 2) forked spans used to be included in the parent's recording until #59815. This explains why we regressed here - Raft application used to be included in recordings properly. This patch makes the application span be a regular child, and so the raft application span is included in the request trace. Touches #70864. That issue asks for a new tracing feature but I think what motivated it is the application info missing from query traces. This patch is sufficient for that. Release note (general change): Tracing transaction commits now includes details about replication. 72896: server: don't mix Tracers in test tenant r=andreimatei a=andreimatei Before this patch, TestServer.StartTenant() would share the Stopper between the server and the tenant, and it would also proceed to give the Tenant a new Tracer. The stopper in question has the server's Tracer in it, and so this sharing is no bueno, because the tenant ends up having two different Tracers, and it uses either of them fairly arbitrarily at different points. Intermingling two Tracers like this was never intended, and attempting to create a child with a different tracer then the parent will become illegal shortly. This shows that sharing stoppers between different "server" is a bad idea. This patch stops the practice. The patch also removes the tracer config options for test tenants because it was unused and supporting it in combination with the stopper configuration would require more sanity checks. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Andrei Matei <[email protected]>

irfansharif requested review from a team as code owners February 4, 2021 19:09

irfansharif removed request for a team February 5, 2021 05:23

irfansharif force-pushed the 210204.manual-collection-forkspan branch from f346302 to 7b9a022 Compare February 5, 2021 05:40

irfansharif requested review from angelapwen, knz and tbg February 5, 2021 14:31

irfansharif commented Feb 5, 2021

View reviewed changes

irfansharif mentioned this pull request Feb 5, 2021

tracing: improve interfaces around process boundaries #59876

Merged

tbg reviewed Feb 8, 2021

View reviewed changes

irfansharif force-pushed the 210204.manual-collection-forkspan branch 2 times, most recently from 5f07c57 to db4abbb Compare February 8, 2021 21:44

irfansharif commented Feb 8, 2021

View reviewed changes

angelapwen approved these changes Feb 10, 2021

View reviewed changes

irfansharif force-pushed the 210204.manual-collection-forkspan branch from db4abbb to fd87ba2 Compare February 10, 2021 16:31

irfansharif mentioned this pull request Feb 10, 2021

[dnm] tracing: clean up use of auto collection across tasks #59391

Closed

irfansharif force-pushed the 210204.manual-collection-forkspan branch from fd87ba2 to a1c3912 Compare February 10, 2021 17:16

irfansharif deleted the 210204.manual-collection-forkspan branch February 10, 2021 19:52

jseldess mentioned this pull request Mar 23, 2021

tracing: use manual collection in ForkCtxSpan cockroachdb/docs#10047

Closed

andreimatei mentioned this pull request Jul 16, 2021

stop, kvclient: include DistSender partial batches in traces #67713

Merged

aayushshah15 mentioned this pull request Aug 13, 2021

release-21.1: stop, kvclient: include DistSender partial batches in traces #68923

Merged

andreimatei mentioned this pull request Nov 14, 2021

kvserver: include Raft application on leaseholder in request trace #72738

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tracing: use manual collection in ForkCtxSpan #59815

tracing: use manual collection in ForkCtxSpan #59815

Uh oh!

irfansharif commented Feb 4, 2021

Uh oh!

cockroach-teamcity commented Feb 4, 2021

Uh oh!

irfansharif left a comment

Uh oh!

tbg left a comment

Uh oh!

irfansharif left a comment

Uh oh!

angelapwen left a comment

Uh oh!

irfansharif commented Feb 10, 2021

Uh oh!

craig bot commented Feb 10, 2021

Uh oh!

irfansharif commented Feb 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if err := ds.rpcContext.Stopper.RunLimitedAsyncTask(
	ctx, "kv.DistSender: sending partial batch",
	ds.asyncSenderSem, false, /* wait */
	func(ctx context.Context) {
	ds.metrics.AsyncSentCount.Inc(1)
	responseCh <- ds.sendPartialBatch(
	ctx, ba, rs, routing, withCommit, batchIdx, true, /* needsTruncate */
	)
	},
	); err != nil {

	// Both halves of the split batch succeeded. Piece them back together.
	resps := make([]roachpb.ResponseUnion, len(swappedReqs))
	copy(resps, br.Responses)
	resps[swapIdx], resps[lastIdx] = resps[lastIdx], resps[swapIdx]
	br.Responses = resps
	if err := br.Combine(qiReply.reply, qiReply.positions); err != nil {
	return nil, roachpb.NewError(err)
	}
	return br, nil

	// Combine the new response with the existing one (including updating
	// the headers) if we haven't yet seen an error.
	if pErr == nil {
	if err := br.Combine(resp.reply, resp.positions); err != nil {
	pErr = roachpb.NewError(err)
	}

	func applyClientOp(ctx context.Context, db clientI, op *Operation) {
	switch o := op.GetValue().(type) {
	case *GetOperation:
	fn := db.Get
	if o.ForUpdate {
	fn = db.GetForUpdate

	// Untangle the error from the received response.
	pErr = reply.Error
	reply.Error = nil // scrub the response error

	log.VEventf(ctx, 1, "evicting range desc %s after %s", routingTok, err)
	routingTok.Evict(ctx)
	continue

tracing: use manual collection in ForkCtxSpan #59815

tracing: use manual collection in ForkCtxSpan #59815

Uh oh!

Conversation

irfansharif commented Feb 4, 2021

Uh oh!

cockroach-teamcity commented Feb 4, 2021

Uh oh!

irfansharif left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

irfansharif left a comment

Choose a reason for hiding this comment

Uh oh!

angelapwen left a comment

Choose a reason for hiding this comment

Uh oh!

irfansharif commented Feb 10, 2021

Uh oh!

craig bot commented Feb 10, 2021

Uh oh!

irfansharif commented Feb 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants