Skip to content

OTel log correlation + distributed tracing #1597

@Richard87

Description

@Richard87

Adopt OpenTelemetry across Radix services so troubleshooting and performance analysis is easier.

Logs: All services (Radix API, Operator, Pipeline Runner) must include OTel correlation fields (trace_id, span_id) in every log line.
Tracing: Create and propagate a single trace from an incoming Radix API request through job creation/processing (e.g., RadixJob) across services.
Propagation: Ensure outbound calls to third parties carry the same trace context.
Spans: Add child spans for long-running steps (e.g., build jobs).
Export: Send spans to a tracing collector (Tempo/Jaeger; optionally evaluate Azure Log Analytics).

Result:
We can trace all logs and actions for one user request across most services and use spans for latency/bottleneck analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions