-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Adopt OpenTelemetry across Radix services so troubleshooting and performance analysis is easier.
Logs: All services (Radix API, Operator, Pipeline Runner) must include OTel correlation fields (trace_id, span_id) in every log line.
Tracing: Create and propagate a single trace from an incoming Radix API request through job creation/processing (e.g., RadixJob) across services.
Propagation: Ensure outbound calls to third parties carry the same trace context.
Spans: Add child spans for long-running steps (e.g., build jobs).
Export: Send spans to a tracing collector (Tempo/Jaeger; optionally evaluate Azure Log Analytics).
Result:
We can trace all logs and actions for one user request across most services and use spans for latency/bottleneck analysis.