This repository was archived by the owner on Nov 17, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 162
[feat] Allow additional sampling hooks #115
Closed
toumorokoshi
wants to merge
8
commits into
open-telemetry:main
from
toumorokoshi:feature/add-sample-to-span-end
Closed
Changes from 4 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
ab72737
Adding proposal to move sample to span end
toumorokoshi 5942c60
renaming as add sample to span end.
toumorokoshi be0a7db
fixing markdown lint
toumorokoshi 762d7d2
renaming to PR number
toumorokoshi 9302221
Addressing feedback
toumorokoshi 940863c
Adding shouldRetry, incremental sample hooks
toumorokoshi 3d8ac4e
Update text/trace/115-add-sample-to-span-end.md
toumorokoshi fa165d1
Merge branch 'master' into feature/add-sample-to-span-end
toumorokoshi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,182 @@ | ||
| # Add Additional Sample Decision at Span End | ||
|
|
||
| This document proposes adding an additional sampling decision made at span end, | ||
| along with keeping the current sampling decision made at span creation. | ||
|
|
||
| ## Motivation | ||
|
|
||
| As of the writing of this OTEP (June 2020), the SDK sampler interface | ||
| requires the choice to sample the span to occur when the span is created. | ||
| This forces patterns such as deferred span creation or span decisions to be | ||
| inconsistent with the criteria based on the final span result. | ||
|
|
||
| Adding an additional sampling decision to the span end will eliminate | ||
| the possible incongruence of a sampler missing spans that should or should not be | ||
| sampled, based on attributes that change through the lifetime of the span. | ||
|
|
||
| Although this document suggests adding an additional sampling point, the desire | ||
| is to solve a deeper issue around contention of making an early sampling decision, | ||
| and enabling the addition or modification of span properties after initial creation. | ||
|
|
||
| This section argues the following: | ||
|
|
||
| 1. deferred span creation is a common pattern in the current spec. | ||
| 2. deferred span creation is always functionally worse than creating a span, | ||
| and is at best equal. | ||
| 3. based on 1 and 2, it is better to address the issues with sampling, rather | ||
| than modify span creation behavior for sampling. | ||
|
|
||
| The following sections explains the rationale behind each argument: | ||
|
|
||
| ### Deferred span creation is a common pattern | ||
|
|
||
| The current specification discourages the use of trace.updateName, as the | ||
| modification of the span name results in an incongruent sampling result (one | ||
| that was based upon the old name). | ||
|
|
||
| The current workaround is to defer span creation: to store attributes and | ||
| fields such as start time in some context object, to be used to create a span | ||
| later. | ||
|
|
||
| However, most operations that involves I/O will need to defer span creation, as | ||
| it is not possible to determine the appropriate span name until some data has | ||
| been received. | ||
|
|
||
| Examples include: | ||
|
|
||
| * http(s) requests: the path and route on the server is only known after enough | ||
| of the payload is received to read the url. | ||
| * RPC requests: RPCs typically include the desired operation in the | ||
| payload, after some bytes have been read. | ||
|
|
||
| The span of the operation should have a start time that is right before the | ||
| first byte is received. | ||
|
|
||
| ### Deferred span creation is worse than creating a span | ||
|
|
||
| Ensuring a span is properly closed, even in error scenarios, is a desirable | ||
| trait, and SDKs have ensured best-effort of closing of spans using language | ||
| specific features such as exception handling or deferred close routines. | ||
|
|
||
| Deferred span creation has not had as much development focus, which results in | ||
| a less error handling that ensures that deferred spans are being created. This | ||
| results in exceptional situations that occur before the span creation (e.g. | ||
| invalid parsing of an http request) to not be represented in a span, as the | ||
| deferred span is never created. | ||
|
|
||
| A way to mitigate this would be to ensure the same level of error handling | ||
| for deferred span creation as with spans themselves (create the span during a | ||
| handled exception). But that would double the amount of maintenance to ensure | ||
| robust span creation. Without span sampling as a concern, it would be desirable | ||
| to just use a span instead, and focus on improving the robustness span creation. | ||
|
|
||
| There is another side effect of deferred span creation, where there may be | ||
| child spans that are created before the deferred span is. Examples here include | ||
| a span to represent the receiving of data: the span that represents receiving data | ||
| would have started before the name of the span is available, since that depends | ||
| on receiving and parsing the incoming payload. The result will at worst be | ||
| multiple root spans that should actually be children of the deferred span, since | ||
| there will be no active span present when the children are created. | ||
|
|
||
| ### It is better to address the issues with sampling | ||
|
|
||
| Given the need for an almost identical, most likely less robust code path | ||
| for deferred spans, and the fact that it is a very common pattern, solving | ||
| issues with sampling would ensure a more robust and maintainable code base. | ||
|
|
||
| ## Explanation | ||
|
|
||
| A sampler would consume the same arguments it does today: | ||
|
|
||
| - span name | ||
| - span context | ||
| - span kind | ||
| - span attributes | ||
| - span links | ||
|
|
||
| The sampler would be called twice: once at creation, and once at span end. | ||
|
|
||
| The span's sampling decision would be determined by the intersection of the | ||
| sampling decision made by either ShouldSample call: | ||
|
|
||
| | First ShouldSample result | Second ShouldSample result | Final Result | | ||
| |---------------------------|----------------------------|--------------------| | ||
| | NOT_RECORD | N/A (Don't Call) | NOT_RECORD | | ||
| | RECORD_AND_SAMPLED | N/A (Don't Call) | RECORD_AND_SAMPLED | | ||
| | RECORD | NOT_RECORD | NOT_RECORD | | ||
| | RECORD | RECORD | RECORD | | ||
| | RECORD | RECORD_AND_SAMPLED | RECORD_AND_SAMPLED | | ||
|
|
||
| ## Internal details | ||
|
|
||
| ### Idempotence of samplers | ||
|
|
||
| The calling of shouldSample multiple times can result in inconsistent results | ||
| if the sampler is truly random. As such, the samplers will need to be idempotent | ||
| based on the input passed. For example, a probability sampler would need to | ||
| use data present in the passed parameters, such as the span id, to return sampling | ||
| decisions. | ||
|
toumorokoshi marked this conversation as resolved.
Outdated
|
||
|
|
||
| ### Sampling | ||
|
|
||
| ## Trade-offs and mitigations | ||
|
|
||
| See alternatives. | ||
|
|
||
| ## Prior art and alternatives | ||
|
|
||
| Related issues: | ||
|
|
||
| - [remove warning on trace.updateName](https://github.com/open-telemetry/opentelemetry-specification/issues/468) | ||
| - [remove warning on trace.updateName PR](https://github.com/open-telemetry/opentelemetry-specification/pull/506) | ||
| - [Allow samplers to be called during different moments in the Span lifetime](https://github.com/open-telemetry/opentelemetry-specification/issues/307) | ||
| - [Sampling decision is too late to gain much performance](https://github.com/open-telemetry/opentelemetry-specification/issues/620) | ||
|
|
||
| ### Tradeoffs of multiple sampled approach | ||
|
|
||
| - Pro: enables setting the recording decision early, which can skip additional | ||
|
toumorokoshi marked this conversation as resolved.
|
||
| processing for instrumentations that use the isRecording field. | ||
| - Pro: enables setting the recording decision early, which can be used | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How can the decision to use a cheaper Span implementation (such as noop span) upon a hook decision be made without having a span created already? Unless I am misunderstanding something, if we aren't using deferred span creation, wouldn't a "real" span be created if the initial sampling decision (upon starting the span) was set to true? e.g. |
||
| to set cheaper Span implementations (such as noop spans) and save on | ||
| processing. | ||
|
|
||
| - Con: increases the complexity of the sampling logic slightly. | ||
| - Con: introduces a complex matrix of final recording / sample decisions based | ||
| on both sample results. | ||
| - Con: increases the processing required per span marginally (need to re-run sampling logic) | ||
|
|
||
| #### Alternative: Always sample at the end | ||
|
|
||
| An alternative approach is to move the existing sampling decision to the end. | ||
|
|
||
| - Pro: reduces complexity of sampling logic | ||
| - Pro: reduces complexity of choosing a noop span (no option to do so) | ||
|
|
||
| - Con: every span, sampled or not, will require the same processing. This will | ||
| result in a net increase of system resources relative to an approach that | ||
| allows optimizations. | ||
|
|
||
| ### Alternative: more hooks for the sampler to update it's decision | ||
|
|
||
| [spec #307](https://github.com/open-telemetry/opentelemetry-specification/issues/307) brings up | ||
| a few more hooks to modify sampling decisions. This is similar to the pros and | ||
| cons of the proposal, with an added pro of requiring more incremental effort | ||
| on determining sampler decisions, and a con of requiring more complexity on | ||
| the sampler's part. | ||
|
toumorokoshi marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## Open questions | ||
|
|
||
| ### The behavior of child spans of an unsampled span | ||
|
|
||
| Question reference: [spec #307](https://github.com/open-telemetry/opentelemetry-specification/issues/307#issuecomment-544912240) | ||
|
|
||
| If the span decision can change over time, child spans cannot rely on the existence of a parent span. This may require some enforced behavior around child spans of an unsampled span also not being sampled. This is difficult, if not impossible, to do as there is no reference to child spans from the parent span. Thus, when the parent changes it's | ||
| span decision, it can not propagate that choice to child spans. | ||
|
|
||
| ### Sampling decision is too late to gain much performance | ||
|
|
||
| Reference: [spec #620](https://github.com/open-telemetry/opentelemetry-specification/issues/620) | ||
|
|
||
| Some samplers can make the choice to sample with effectively no data, | ||
| such as a probabilistic sampler. There is currently no facility to enable | ||
| these types of samplers. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.