Skip to content

sdk/trace/x: Add experimental ProbabilitySampler#8123

Open
yuanyuanzhao3 wants to merge 9 commits intoopen-telemetry:mainfrom
yuanyuanzhao3:traceidratio-sampler-x
Open

sdk/trace/x: Add experimental ProbabilitySampler#8123
yuanyuanzhao3 wants to merge 9 commits intoopen-telemetry:mainfrom
yuanyuanzhao3:traceidratio-sampler-x

Conversation

@yuanyuanzhao3
Copy link
Copy Markdown
Contributor

@yuanyuanzhao3 yuanyuanzhao3 commented Apr 1, 2026

Description

Adds an experimental ProbabilitySampler in go.opentelemetry.io/otel/sdk/trace/x that conforms to the OpenTelemetry specification's threshold-based sampling algorithm.

Features

  • Threshold-based sampling: Uses the least significant 56 bits of the trace ID (per W3C Trace Context Level 2 Random Trace ID Flag) for deterministic sampling decisions
  • Tracestate th handling: Encodes and propagates the sampling threshold in the W3C ot tracestate vendor key for consistent downstream sampling
  • Random bit support: Integrates with TraceFlags.IsRandom() and WithRandom() (trace: add Random Trace ID Flag #8012) for proper indication of th value for extrapolated metrics support

Files

  • sdk/trace/x/sampler.go — Core sampler implementation (ProbabilitySampler)
  • sdk/trace/x/sampler_test.go — Tests for sampler behavior
  • sdk/trace/x/tracestate.go — Tracestate th/rv key helpers
  • sdk/trace/x/tracestate_test.go — Tests for tracestate helpers
  • sdk/trace/x/README.md — Documentation for the experimental feature

Related

Co-Author

Joshua MacDonald jmacd@users.noreply.github.com

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 97.32143% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.4%. Comparing base (48dd8b1) to head (c9118a6).
⚠️ Report is 23 commits behind head on main.

Files with missing lines Patch % Lines
sdk/trace/x/sampler.go 94.7% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff           @@
##            main   #8123    +/-   ##
======================================
  Coverage   82.3%   82.4%            
======================================
  Files        310     312     +2     
  Lines      24260   24375   +115     
======================================
+ Hits       19982   20093   +111     
- Misses      3901    3904     +3     
- Partials     377     378     +1     
Files with missing lines Coverage Δ
sdk/trace/x/tracestate.go 100.0% <100.0%> (ø)
sdk/trace/x/sampler.go 94.7% <94.7%> (ø)

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this still. I see we already have a TraceIDRatioBased sampler in the SDK. I also see that the sampler is marked deprecated in the spec: https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceidratiobased, so i'm not sure why we would be adding new features to it. The changes here seem to align with the ProbabilitySampler specification, not the TraceIDRatioBased sampler specification, so i'm not sure why this is named TraceIDRatioBased, and links to that specification.

@jmacd maybe you can clarify? Is the intent that SDKs introduce a ProbabilitySampler artifact along-side the TraceIDRatioBased sampler? Or is the intent that we only keep TraceIDRatioBased, but update it to follow the ProbabilitySampler spec (is that breaking)?

Comment thread sdk/trace/x/sampler.go Outdated
Comment thread sdk/trace/x/sampler.go Outdated
Comment thread sdk/trace/x/sampler.go Outdated
@yuanyuanzhao3
Copy link
Copy Markdown
Contributor Author

I'm a bit confused by this still. I see we already have a TraceIDRatioBased sampler in the SDK. I also see that the sampler is marked deprecated in the spec: https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceidratiobased, so i'm not sure why we would be adding new features to it. The changes here seem to align with the ProbabilitySampler specification, not the TraceIDRatioBased sampler specification, so i'm not sure why this is named TraceIDRatioBased, and links to that specification.

@jmacd maybe you can clarify? Is the intent that SDKs introduce a ProbabilitySampler artifact along-side the TraceIDRatioBased sampler? Or is the intent that we only keep TraceIDRatioBased, but update it to follow the ProbabilitySampler spec (is that breaking)?

Not @jmacd, but here's my understanding:

This is intended to be a new API-level compatible replacement for the original TraceIDRatioBased sampler. The difference lies in the underlying sampling algorithm and the information it passes through tracestate, which allows proper extrapolation of span metrics. I don't believe the previous tracestate information was ever used by any of our supported connectors (but others might know a place or two that I'm not aware of).

The new spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md is what this PR is about and it is a kind of probability sampling.

I do not know why the place https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceidratiobased is marking TraceIdRatioBased as deprecated.

Comment thread sdk/trace/x/sampler.go Outdated
hbits = 4
)
if fraction > probabilityOneThreshold {
return sdktrace.AlwaysSample()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't propagate th properly, right? It also doesn't return the correct description.

Same comment for NeverSample below.

Copy link
Copy Markdown
Contributor

@jmacd jmacd Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared with https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/208b7c0565fe51033fa8d8b3d96a3c3dcba79a3f/pkg/sampling/probability.go#L33, which is an ancestor of this. It returns an error in these cases.

I realize this may have happened because in an earlier implementation of this, I had assumed that AlwaysSample() would return a sampler that sets ot=th:0. @yuanyuanzhao3 sadly note that https://opentelemetry.io/docs/specs/otel/trace/sdk/#alwayson does not dictate to set th:0, and I'm not sure myself whether this counts as oversight or just the path of least resistance. It suggests we should have ProbabilitySampler with fraction=1 fix this, later we can discuss modifying the OTel SDK (which is I think what we want).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is probably a bit more nuanced for AlwaysSample to export th:0. Without risking too philosophical, it depends on what we think th:0 conveys. That is, besides that it allows every trace to pass, does it also convey the kind of sampling algorithm used? Would there be another sampling algorithm that is also deemed as "OTel-native", is not based on threshold but can be configured to pass all traces. If such cases exist, how many tracestate encodings should AlwaysSample output?

So, in short, let's defer the AlwaysSample change.

I have updated the implementation to use ProbabilitySampler{0}.

@dashpole
Copy link
Copy Markdown
Contributor

dashpole commented Apr 8, 2026

The new spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md is what this PR is about and it is a kind of probability sampling.

I do not know why the place https://opentelemetry.io/docs/specs/otel/trace/sdk/#traceidratiobased is marking TraceIdRatioBased as deprecated.

The sdk/trace package (and experimental features in sdk/trace/x) needs to follow the specification in https://opentelemetry.io/docs/specs/otel/trace/sdk. The only place that tracestate-probability-sampling.md is referenced from in the SDK spec is from https://opentelemetry.io/docs/specs/otel/trace/sdk/#probabilitysampler. So if you are implementing https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md, I assume it should follow the ProbabilitySampler specification in the SDK.

If my understanding is correct, please make sure it follows the ProbabilitySampler specification.

Other than naming this ProbabilitySampler, i don't see the compatibility warning described here: https://opentelemetry.io/docs/specs/otel/trace/sdk/#compatibility-warnings-for-probabilitysampler

@yuanyuanzhao3 yuanyuanzhao3 changed the title sdk/trace/x: Add experimental XTraceIDRatioBased sampler sdk/trace/x: Add experimental TraceIDRatioBased sampler Apr 9, 2026
@jmacd
Copy link
Copy Markdown
Contributor

jmacd commented Apr 10, 2026

I can probably explain some confusion -- there was a late-entering change in the sampling specification as it moved from OTEP 235 to the SDK specification, where we decided on the name ProbabilitySampler and that TraceIDRatioBased would have a deprecation period.

@yuanyuanzhao3 see open-telemetry/opentelemetry-specification#4627 for details. Sorry for the confusion! Thank you @dashpole for your guidance.

Add a threshold-based TraceIDRatioBased sampler that conforms to the
OpenTelemetry specification. It uses the least significant 56 bits of
the trace ID for deterministic sampling decisions and propagates the
sampling threshold via the `th` sub-key in the W3C `ot` tracestate
vendor key. When an explicit `rv` (randomness value) is present in
the tracestate, it is used instead of the trace ID.
Align naming with the trace/x API and update changelog and tests.

Made-with: Cursor
@yuanyuanzhao3 yuanyuanzhao3 force-pushed the traceidratio-sampler-x branch from 6c0d82c to 64eb6d3 Compare April 13, 2026 20:11
@yuanyuanzhao3 yuanyuanzhao3 changed the title sdk/trace/x: Add experimental TraceIDRatioBased sampler sdk/trace/x: Add experimental ProbabilitySampler Apr 13, 2026
@yuanyuanzhao3
Copy link
Copy Markdown
Contributor Author

@dashpole , @jmacd , PTAL. There is a test failure, but seems not related.

Copy link
Copy Markdown
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this forward. I reviewed it against the current SDK sampling spec, especially the newer ProbabilitySampler behavior. There are still a few issues to address.

Comment thread sdk/trace/x/sampler.go Outdated
Comment on lines +64 to +74
var newOtts string
// Only insert/update th when randomness is available (either from
// explicit rv value or trace ID with the random flag). Otherwise,
// erase any existing th to signal the span is not guaranteed to be
// statistically representative.
// See https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/#general-requirements
if hasRandomness || psc.TraceFlags().IsRandom() {
newOtts = InsertOrUpdateTraceStateThKeyValue(existingOtts, ps.thkv)
} else {
newOtts = eraseTraceStateThKeyValue(existingOtts)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks stricter than the ProbabilitySampler spec. For sampled spans, th should be propagated based on the threshold decision itself, and samplers should presume TraceID randomness unless explicit rv says otherwise.

With the current condition, sampled root spans never emit ot=th:..., and sampled child spans will also lose th after a Trace Context v1 hop when the Random flag is not present. That breaks downstream consistent-probability behavior.

The current SDK spec for ProbabilitySampler says sampled spans should include th:T, and the sampling requirements say samplers should presume TraceID randomness unless rv is present:

I think this should insert/update th whenever the span is sampled, rather than gating on the parent Random flag.

Suggested change
var newOtts string
// Only insert/update th when randomness is available (either from
// explicit rv value or trace ID with the random flag). Otherwise,
// erase any existing th to signal the span is not guaranteed to be
// statistically representative.
// See https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/#general-requirements
if hasRandomness || psc.TraceFlags().IsRandom() {
newOtts = InsertOrUpdateTraceStateThKeyValue(existingOtts, ps.thkv)
} else {
newOtts = eraseTraceStateThKeyValue(existingOtts)
}
newOtts := InsertOrUpdateTraceStateThKeyValue(existingOtts, ps.thkv)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. We specifically discussed this in the sampling SIG meeting on March 26.

The issue with non-random trace ID is that they skew span metrics extrapolation. The spanmetricsconnector treats spans with th value and without th differently. The former is used for extrapolation as is marked with an attribute sampling_method=extrapolated. The latter is counted as one (representing just itself) and the metric is marked with an attribute sampling_method=counted.

Should we clarify this in the spec? @jmacd FYI

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with ignoring the absence of randomness indicator is that, it skews the statistics and there is no way for the downstream/customers to know about it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm cool with whatever way we want to go here. Just what is implemented here needs to be defined by the specification.

Copy link
Copy Markdown
Contributor

@jmacd jmacd Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out @MrAlias. I believe @yuanyuanzhao3 has this the way it is described, just not in the SDK specification. The property we are after is included in https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/#parent-based-threshold. In the meeting you remember, it was @PeterF778 who led the group to this requirement, and he confirmed it again.

To resolve this, @yuanyuanzhao3 will you (a) file an issue at https://github.com/open-telemetry/opentelemetry-specification/issues describing how this stipulation about erasing threshold when it is inconsistent was omitted from the specification language, (b) suggest a change in tracing/sdk.md we can make to the experimental specification?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created this issue: open-telemetry/opentelemetry-specification#5035

Please feel free to assign it to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must retract what I said above. See open-telemetry/opentelemetry-specification#5035 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in today's sampling SIG meeting. The resolution is to keep the spec as it is and conform to the spec. @MrAlias thanks for keeping us adhering to the spec.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says:

the SDK SHOULD issue a warning statement in its log with a compatibility warning

This is bound to be very chatty as the absence of randomness flag would not be a rare instance once it occurs. Opinions?

Comment thread sdk/trace/x/sampler_test.go
Comment thread sdk/trace/x/tracestate.go
Comment thread sdk/trace/x/README.md Outdated
Comment thread CHANGELOG.md Outdated
@yuanyuanzhao3
Copy link
Copy Markdown
Contributor Author

FYI, the ProbabilitySampler spec revision is here: open-telemetry/opentelemetry-specification#5039

yuanyuanzhao3 and others added 2 commits April 23, 2026 15:48
always put `th` value in `tracestate` even though randomness indication is missing. This conforms to the presumption of traceID randomness which supports extrapolating span metrics even though only the current service uses W3C tracecontext level 2.

Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an experimental threshold-based probabilistic sampler (ProbabilitySampler) under go.opentelemetry.io/otel/sdk/trace/x, along with tracestate helpers and tests, aiming to align with the OpenTelemetry spec’s deterministic sampling algorithm and ot tracestate propagation.

Changes:

  • Introduce sdk/trace/x experimental package with ProbabilitySampler implementation.
  • Add ot tracestate helpers for th (threshold) and rv (randomness) handling plus unit tests.
  • Document the experimental feature and record it in the changelog.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sdk/trace/x/sampler.go Implements ProbabilitySampler and tracestate mutation during sampling.
sdk/trace/x/sampler_test.go Adds behavioral tests for probability thresholds, inclusivity, and tracestate updates.
sdk/trace/x/tracestate.go Adds helpers for parsing rv and inserting/updating/erasing th within ot tracestate value.
sdk/trace/x/tracestate_test.go Tests rv parsing and th insert/erase behavior.
sdk/trace/x/README.md Documents the experimental package and ProbabilitySampler usage.
CHANGELOG.md Adds an entry for the new experimental sampler.

Comment thread sdk/trace/x/sampler.go
Comment on lines +47 to +55
var randomness uint64
var hasRandomness bool
if existingOtts != "" {
randomness, hasRandomness = tracestateRandomness(existingOtts)
}

if !hasRandomness {
randomness = binary.BigEndian.Uint64(p.TraceID[8:16]) & randomnessMask
}
Comment thread sdk/trace/x/tracestate.go
Comment on lines +59 to +67
if len(otts) < start+14 || (len(otts) > start+14 && otts[start+14] != ';') {
otel.Handle(fmt.Errorf("could not parse tracestate randomness: %s", otts))
return 0, false
}

rv, err := strconv.ParseUint(otts[start:start+14], 16, 56)
if err != nil {
otel.Handle(fmt.Errorf("could not parse tracestate randomness: %s", otts))
return 0, false
Comment on lines +109 to +112
for range numTraces {
traceID := trace.TraceID{}
_, _ = rand.Read(traceID[:])
params := sdktrace.SamplingParameters{
Comment thread sdk/trace/x/sampler.go
Comment on lines +41 to +70
func (ps *probabilitySampler) ShouldSample(p sdktrace.SamplingParameters) sdktrace.SamplingResult {
psc := trace.SpanContextFromContext(p.ParentContext)
state := psc.TraceState()

existingOtts := state.Get("ot")

var randomness uint64
var hasRandomness bool
if existingOtts != "" {
randomness, hasRandomness = tracestateRandomness(existingOtts)
}

if !hasRandomness {
randomness = binary.BigEndian.Uint64(p.TraceID[8:16]) & randomnessMask
}

if ps.threshold > randomness {
return sdktrace.SamplingResult{
Decision: sdktrace.Drop,
Tracestate: state,
}
}

newOtts := InsertOrUpdateTraceStateThKeyValue(existingOtts, ps.thkv)

if newOtts == "" {
state = state.Delete("ot")
return sdktrace.SamplingResult{Decision: sdktrace.RecordAndSample, Tracestate: state}
}
combined, err := state.Insert("ot", newOtts)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants