Skip to content

Observability Edge cases and gaps #3852

@radisicc

Description

@radisicc

Typos in code examples

  • The Python complete example has "" where it should be """ for the triple-quoted string in code_run.
  • The TypeScript example uses console.log`Created sandbox: ...` (tagged template literal) instead of console.log(...) — this appears twice.

Structural suggestion
The page covers two distinct features (sandbox telemetry collection via dashboard settings, and SDK-level tracing via code). The relationship between them could be clearer upfront — maybe a brief "two types of observability" overview before diving in, since a user might land here wanting one and get confused by the other.

Compression
OTEL_EXPORTER_OTLP_COMPRESSION=gzip isn't mentioned — matters for high-volume production use.

Failure behavior
What happens when the OTLP endpoint is unreachable? Does the SDK silently drop spans, queue and retry, or throw? Users need to know this won't break their app if the collector goes down.

Protocol ambiguity
Port 4317 (gRPC) is used for New Relic and 4318 (HTTP) for Jaeger, but the difference is never explained. Users will hit connection errors if they pick the wrong one. Worth noting OTEL_EXPORTER_OTLP_PROTOCOL (grpc vs http/protobuf) and which port maps to which.

TLS for self-hosted setups
No mention of OTEL_EXPORTER_OTLP_INSECURE=true, which people will need for local/dev collectors without TLS. They'll get cryptic connection errors otherwise.

Service naming
No mention of OTEL_SERVICE_NAME or OTEL_RESOURCE_ATTRIBUTES. Without setting a service name, traces will show up as unknown_service in most backends, which is confusing when you have multiple apps.

Sampling config
"Consider configuring trace sampling" is mentioned in best practices but no actual guidance is given. Even a one-liner like OTEL_TRACES_SAMPLER=parentbased_traceidratio and OTEL_TRACES_SAMPLER_ARG=0.1 would help.

Collector as gateway pattern
Many production setups run an OTEL Collector as a local gateway rather than exporting directly to a vendor. This is a common architecture worth a brief note, especially since it's how AWS, GCP, and Azure typically work.


Would be good to add

Additional backend examples for:

  • Datadog
  • Honeycomb
  • Dynatrace
  • AWS CloudWatch / X-Ray
  • Google Cloud Trace
  • Azure Monitor

All of these support OTLP via their own collectors or native endpoints.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions