fix(tracing): install W3C TraceContext propagator in OTLP path#532
fix(tracing): install W3C TraceContext propagator in OTLP path#532
Conversation
The OTLP init path built the tracer provider and wired it globally, but never called `set_text_map_propagator`. The `GoogleStackDriverMakeSpan` used by `tower_http::TraceLayer` in the RPC server calls `get_text_map_propagator` to extract the inbound `traceparent`, which silently returns a no-op when nothing has been installed — so every inbound request started a fresh root trace even when the caller sent a trace context. The `gcloud` init path already installs a `GoogleTraceContextPropagator`, which is why distributed tracing works under that backend. Adding the equivalent W3C `TraceContextPropagator` for OTLP restores cross-service trace chaining. Verified end-to-end against a local jaeger v2.17 OTLP collector: before this fix, sending `traceparent: 00-<id>-<span>-01` to katana produced a random trace_id; after, the exported `http_request` span carries the caller's `<id>` unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Benchmark
Details
| Benchmark suite | Current: a932cc9 | Previous: 3e40686 | Ratio |
|---|---|---|---|
CompiledClass(fixture)/compress |
2729611 ns/iter (± 7686) |
2576810 ns/iter (± 7726) |
1.06 |
CompiledClass(fixture)/decompress |
2887429 ns/iter (± 15145) |
2833303 ns/iter (± 14941) |
1.02 |
ExecutionCheckpoint/compress |
32 ns/iter (± 8) |
36 ns/iter (± 8) |
0.89 |
ExecutionCheckpoint/decompress |
25 ns/iter (± 7) |
26 ns/iter (± 3) |
0.96 |
PruningCheckpoint/compress |
32 ns/iter (± 7) |
36 ns/iter (± 7) |
0.89 |
PruningCheckpoint/decompress |
25 ns/iter (± 4) |
26 ns/iter (± 8) |
0.96 |
VersionedHeader/compress |
643 ns/iter (± 6) |
660 ns/iter (± 13) |
0.97 |
VersionedHeader/decompress |
809 ns/iter (± 29) |
865 ns/iter (± 16) |
0.94 |
StoredBlockBodyIndices/compress |
77 ns/iter (± 7) |
81 ns/iter (± 10) |
0.95 |
StoredBlockBodyIndices/decompress |
36 ns/iter (± 9) |
40 ns/iter (± 7) |
0.90 |
StorageEntry/compress |
152 ns/iter (± 4) |
159 ns/iter (± 2) |
0.96 |
StorageEntry/decompress |
141 ns/iter (± 3) |
157 ns/iter (± 3) |
0.90 |
ContractNonceChange/compress |
151 ns/iter (± 3) |
159 ns/iter (± 1) |
0.95 |
ContractNonceChange/decompress |
242 ns/iter (± 10) |
257 ns/iter (± 4) |
0.94 |
ContractClassChange/compress |
225 ns/iter (± 3) |
213 ns/iter (± 4) |
1.06 |
ContractClassChange/decompress |
251 ns/iter (± 3) |
279 ns/iter (± 6) |
0.90 |
ContractStorageEntry/compress |
158 ns/iter (± 7) |
170 ns/iter (± 2) |
0.93 |
ContractStorageEntry/decompress |
309 ns/iter (± 6) |
348 ns/iter (± 6) |
0.89 |
GenericContractInfo/compress |
137 ns/iter (± 5) |
139 ns/iter (± 8) |
0.99 |
GenericContractInfo/decompress |
102 ns/iter (± 3) |
109 ns/iter (± 3) |
0.94 |
Felt/compress |
81 ns/iter (± 7) |
93 ns/iter (± 8) |
0.87 |
Felt/decompress |
56 ns/iter (± 4) |
65 ns/iter (± 6) |
0.86 |
BlockHash/compress |
82 ns/iter (± 5) |
93 ns/iter (± 7) |
0.88 |
BlockHash/decompress |
54 ns/iter (± 4) |
63 ns/iter (± 5) |
0.86 |
TxHash/compress |
81 ns/iter (± 6) |
93 ns/iter (± 7) |
0.87 |
TxHash/decompress |
55 ns/iter (± 4) |
63 ns/iter (± 5) |
0.87 |
ClassHash/compress |
81 ns/iter (± 1) |
93 ns/iter (± 12) |
0.87 |
ClassHash/decompress |
54 ns/iter (± 5) |
63 ns/iter (± 4) |
0.86 |
CompiledClassHash/compress |
81 ns/iter (± 6) |
93 ns/iter (± 7) |
0.87 |
CompiledClassHash/decompress |
54 ns/iter (± 4) |
63 ns/iter (± 4) |
0.86 |
BlockNumber/compress |
47 ns/iter (± 2) |
50 ns/iter (± 2) |
0.94 |
BlockNumber/decompress |
25 ns/iter (± 0) |
26 ns/iter (± 0) |
0.96 |
TxNumber/compress |
47 ns/iter (± 2) |
50 ns/iter (± 2) |
0.94 |
TxNumber/decompress |
26 ns/iter (± 0) |
26 ns/iter (± 0) |
1 |
FinalityStatus/compress |
1 ns/iter (± 0) |
0 ns/iter (± 0) |
+∞ |
FinalityStatus/decompress |
12 ns/iter (± 1) |
12 ns/iter (± 0) |
1 |
TypedTransactionExecutionInfo/compress |
16208 ns/iter (± 1907) |
15017 ns/iter (± 36) |
1.08 |
TypedTransactionExecutionInfo/decompress |
3577 ns/iter (± 116) |
3766 ns/iter (± 71) |
0.95 |
VersionedContractClass/compress |
359 ns/iter (± 5) |
362 ns/iter (± 5) |
0.99 |
VersionedContractClass/decompress |
771 ns/iter (± 32) |
854 ns/iter (± 4) |
0.90 |
MigratedCompiledClassHash/compress |
161 ns/iter (± 4) |
178 ns/iter (± 4) |
0.90 |
MigratedCompiledClassHash/decompress |
144 ns/iter (± 9) |
159 ns/iter (± 4) |
0.91 |
ContractInfoChangeList/compress |
1721 ns/iter (± 40) |
1576 ns/iter (± 112) |
1.09 |
ContractInfoChangeList/decompress |
2228 ns/iter (± 379) |
2338 ns/iter (± 378) |
0.95 |
BlockChangeList/compress |
765 ns/iter (± 42) |
704 ns/iter (± 48) |
1.09 |
BlockChangeList/decompress |
898 ns/iter (± 155) |
945 ns/iter (± 156) |
0.95 |
ReceiptEnvelope/compress |
30680 ns/iter (± 2552) |
28166 ns/iter (± 921) |
1.09 |
ReceiptEnvelope/decompress |
6109 ns/iter (± 253) |
6604 ns/iter (± 251) |
0.93 |
TrieDatabaseValue/compress |
164 ns/iter (± 3) |
159 ns/iter (± 1) |
1.03 |
TrieDatabaseValue/decompress |
227 ns/iter (± 2) |
260 ns/iter (± 9) |
0.87 |
TrieHistoryEntry/compress |
291 ns/iter (± 7) |
305 ns/iter (± 35) |
0.95 |
TrieHistoryEntry/decompress |
261 ns/iter (± 10) |
289 ns/iter (± 11) |
0.90 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.30.
| Benchmark suite | Current: a932cc9 | Previous: 3e40686 | Ratio |
|---|---|---|---|
FinalityStatus/compress |
1 ns/iter (± 0) |
0 ns/iter (± 0) |
+∞ |
This comment was automatically generated by workflow using github-action-benchmark.
CC: @kariy
|
…led revs - Add a "Distributed tracing" section to docs/cartridge.md covering wire protocol per service (gRPC for katana/vrf-server, HTTP for paymaster-service), enable flags, span inventory, the paymaster's `prometheus`-named-but-OTLP config quirk, and an end-to-end repro. - Bump sidecar pins to the tracing-enabled commits on main: - paymaster-service: 4748365 -> 8fc62f2 (cartridge-gg/paymaster#15) - vrf-server: 6d1c0f6 -> d7a1f9a (cartridge-gg/vrf#46) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #532 +/- ##
==========================================
- Coverage 73.32% 66.45% -6.87%
==========================================
Files 209 308 +99
Lines 23132 41843 +18711
==========================================
+ Hits 16961 27806 +10845
- Misses 6171 14037 +7866 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
The OTLP init path never installed a text-map propagator, so katana silently dropped inbound
traceparentheaders under that backend — every request became a fresh root trace even when the caller had a live trace context. Thegcloudpath installs aGoogleTraceContextPropagator; OTLP was missing the equivalent W3C one.One-line fix: add
opentelemetry::global::set_text_map_propagator(TraceContextPropagator::new())inotlp::init_tracer.Why this matters
crates/rpc/rpc-server/src/middleware/...wraps incoming HTTP in atower_http::TraceLayerconfigured withGoogleStackDriverMakeSpan, which callsopentelemetry::global::get_text_map_propagator(|p| p.extract(&HeaderExtractor(req.headers()))). With no propagator installed,get_text_map_propagatorreturns aNoopTextMapPropagatorand the extract call silently produces an empty context. Result:span.set_parent(cx)is effectively a no-op, and the exported span starts a new trace instead of chaining to the caller's.Repro (before this PR)
After this PR, the exported
http_requestspan hastrace_id=0af7651916cd43dd8448eb211c80319c— the caller's context is preserved, and any spans katana fans out downstream chain under the same trace.Test plan
cargo check -p katana-tracingcleancargo +nightly-2025-02-20 fmt --all -- --checkcleanDiscovered during
Wiring distributed tracing into the cartridge sidecar services (cartridge-gg/vrf#46, cartridge-gg/paymaster#15). Those PRs correctly install the W3C propagator; this PR closes the gap on the katana side so the three-service chain stitches under one trace_id in the collector.
🤖 Generated with Claude Code