fix: OTLP logging improvements#3343
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3343 +/- ##
=======================================
- Coverage 82.2% 82.0% -0.2%
=======================================
Files 128 128
Lines 24506 24553 +47
=======================================
+ Hits 20147 20155 +8
- Misses 4359 4398 +39 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| let response = client.send_bytes(request).await.map_err(|e| { | ||
| HttpExportError::new(0, format!("Network error: {e:?}")) // Network error | ||
| otel_warn!( | ||
| name: "HttpClient.ExportFailed", |
There was a problem hiding this comment.
We are using the same event name HttpClient.ExportFailed for two different kinds of errors:
- Network error from
client.send_bytes - HTTP status error when
response.status()is not successful.
Maybe we could use different event names for better diagnosability: HttpClient.NetworkError and HttpClientStatusError.
There was a problem hiding this comment.
I did consider that and ended up with same one. I am not 100% happy with this either.
I tried to distinguish between failures where we don't get a response at all and the ones where we get response, but it has status code indicating failure. But then realized that request client don't expose us a way to obtain StatusCode from its Error. Without having firmer control over the client, this is a decent tradeoff.
(The otel-http crate could offer additional logging, as it knows internals of the client it uses, but OTLP exporter does not know that much).
let me know if we should try to make this better or this is acceptable tradeoff.
There was a problem hiding this comment.
Pull request overview
This PR addresses a security vulnerability where authentication tokens and other sensitive information could be logged at ERROR/WARN levels during OTLP export failures. The fix implements a two-tier logging approach: non-sensitive error indicators (status codes, URLs) are logged at WARN level for operational visibility, while potentially sensitive details (error messages, response bodies) are logged only at DEBUG level.
Changes:
- Modified gRPC exporters (traces, metrics, logs) to split error logging into WARN-level status codes and DEBUG-level detailed messages
- Updated HTTP exporter to separate status codes/URLs (WARN) from error details/response bodies (DEBUG)
- Simplified error messages returned to callers to prevent sensitive data propagation upstream
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| opentelemetry-otlp/src/exporter/tonic/trace.rs | Adds two-tier logging for trace export failures: WARN for gRPC codes, DEBUG for sensitive message/details |
| opentelemetry-otlp/src/exporter/tonic/metrics.rs | Adds two-tier logging for metrics export failures: WARN for gRPC codes, DEBUG for sensitive message/details |
| opentelemetry-otlp/src/exporter/tonic/logs.rs | Adds two-tier logging for logs export failures: WARN for gRPC codes, DEBUG for sensitive message/details |
| opentelemetry-otlp/src/exporter/http/mod.rs | Adds two-tier logging for HTTP export failures: WARN for status/URL, DEBUG for error details/response body |
| opentelemetry-otlp/CHANGELOG.md | Documents the security improvement and references issue #3021 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
krisztianfekete
left a comment
There was a problem hiding this comment.
I like this direction, but as discussed in #3331 I think we can improve the UX where one have to enable DEBUG level logging to see that the problem is a user-side misconfiguration that is easily fixable. I am proposing surfacing the ConnectError from the above example at an elevated log level. In this case, further sanitization isn't even required as that message doesn't even contain the actual connection string.
cc. @cijothomas
Co-authored-by: Utkarsh Umesan Pillai <[email protected]>
Co-authored-by: Utkarsh Umesan Pillai <[email protected]>
Co-authored-by: Utkarsh Umesan Pillai <[email protected]>
| // For auth errors (Unauthenticated, PermissionDenied), the message may contain | ||
| // sensitive information, so we only log the code at WARN level. | ||
| let code = tonic_status.code(); | ||
| let is_connection_error = matches!( |
There was a problem hiding this comment.
nit - this ~35-40 lines of code seems to be repeated for all the signals - can have a shared helper method.
| ); | ||
| } | ||
| Err(OTelSdkError::InternalFailure( | ||
| "Metrics export failed".into(), |
There was a problem hiding this comment.
grpc_code should also be added here?
| ); | ||
| // Convert interceptor errors to tonic::Status for retry classification | ||
| tonic::Status::internal(format!("interceptor error: {e:?}")) | ||
| tonic::Status::internal("Traces export failed in interceptor") |
can you recheck now? I made some changes to help. |
@cijothomas It looks like the source error is still not logged at WARN/ERROR level, although we have some additional gRPC context for some codes. Do you think we can surface the source error message at a higher level without losing it? |
Fixes #3021
Supersedes #3108