-
Notifications
You must be signed in to change notification settings - Fork 639
Description
The span export path has three separate clones/allocations between BatchSpanProcessor and the wire. Together they mean every batch of spans gets fully copied three times before serialization. The log export path already avoids this with borrowed data (LogBatch), making this a trace-only problem.
I found these by reading the code, not with a profiler, but each one is unambiguous.
Clone 1: batch.split_off(0) in BatchSpanProcessor
In opentelemetry-sdk/src/trace/span_processor.rs, export_batch_sync() does:
let export = exporter.export(batch.split_off(0));
split_off(0) allocates a brand new Vec and moves all elements into it on every single export call. The existing code even has a TODO comment acknowledging this:
// TODO: Compared to Logs, this requires new allocation for vec for
// every export. See if this can be optimized by
// *not* requiring ownership in the exporter.
This happens because SpanExporter::export() takes Vec (owned), unlike LogExporter::export() which takes LogBatch<'_> (borrowed).
Clone 2: deep clone of entire batch per retry in tonic exporter
In opentelemetry-otlp/src/exporter/tonic/trace.rs, the export() method wraps the batch in Arc for retry support (line 75), then on every retry iteration clones the entire Vec:
let resource_spans = group_spans_by_resource_and_scope((*batch_clone).clone(), &self.resource);
Even on the happy path with zero retries, this clone happens once. With retries it happens N+1 times. Each clone deep-copies every SpanData including all attributes, events, and links.
Clone 3: per-span clone during proto conversion
In opentelemetry-proto/src/transform/trace.rs, group_spans_by_resource_and_scope() groups spans by scope using references, but then clones each span individually during the From conversion:
spans: span_records.into_iter().map(|span_data| span_data.clone().into()).collect(),
This happens because the grouping step borrows &SpanData but the proto conversion (From for Span) requires ownership.
Contrast with LogExporter
LogExporter (opentelemetry-sdk/src/logs/export.rs) uses a completely different design:
fn export(&self, batch: LogBatch<'_>) -> impl Future<Output = OTelSdkResult> + Send;
LogBatch holds borrowed references. The batch processor can reuse its buffer across exports. The exporter never needs to clone data just to satisfy an ownership requirement. SpanExporter should follow this pattern.
Suggested fix
- Change SpanExporter::export() to take borrowed data (a SpanBatch<'_> similar to LogBatch), so the batch processor can reuse its buffer and exporters don't need ownership of the input.
- Make proto conversion consume data where possible instead of cloning. group_spans_by_resource_and_scope could take ownership of the Vec (or borrow and convert in place) rather than borrow-then-clone.
- For retry support, serialize to bytes once before the retry loop instead of cloning SpanData per attempt. The tonic exporter already has the ExportTraceServiceRequest built before calling client.export() -- that request (or its serialized bytes) could be cached across retries.
Related issues and PRs:
- OTLP Exporter - reuse buffers #3123 (OTLP Exporter - reuse buffers) tracks the general allocation problem but doesn't specifically call out the triple clone in the span path
- perf: Benchmarks for OTLP conversion #3307 (benchmarks for OTLP conversion) is measuring but not fixing
- fix: Modify SpanExporter trait to use immutable references for shutdown and force_flush methods #3066 addresses &self on shutdown/flush but does not touch the export(Vec) ownership problem
- Tracing SDK - Processors get clone of SpanData which is incorrect #2726 tracks SpanData cloning at the processor level (on_end), which is upstream of this but a separate issue