-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Distribute internal metrics across different levels #9767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distribute internal metrics across different levels #9767
Conversation
7fc0d50 to
d2068ac
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9767 +/- ##
==========================================
- Coverage 91.79% 91.73% -0.06%
==========================================
Files 358 358
Lines 16576 16584 +8
==========================================
- Hits 15216 15214 -2
- Misses 1037 1042 +5
- Partials 323 328 +5 ☔ View full report in Codecov by Sentry. |
d2068ac to
2218e9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this interact with #9510 (comment) ?
I think we should make a decision on whether to remove Metrics Level and if so what's our plan until we remove it
|
I'm ok with having other configs for that, like views. AFAIU, this isn't possible even with |
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok, just one question
processor/batchprocessor/metrics.go
Outdated
| meter metric.Meter | ||
| ) | ||
|
|
||
| // BatchProcessor metrics are not subject to the same level of filtering as other components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this true for all components? If the default is normal, aren't all metrics (other than instrumentation library metrics) emitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this true for all components?
Yes, it's true for all components. With this comment, I wanted to say that the batch processor is the only component in core emitting metrics (with the normal level). It's probably confusing. Let me update it
If the default is normal, aren't all metrics (other than instrumentation library metrics) emitted?
Right
2218e9b to
2357865
Compare
The internal metrics levels are updated along with reported metrics:
- The default level is changed from `basic` to `normal`, which can be overridden with `service::telmetry::metrics::level` configuration.
- Batch processor metrics are updated to be reported starting from `normal` level:
- `processor_batch_batch_send_size`
- `processor_batch_metadata_cardinality`
- `processor_batch_timeout_trigger_send`
- `processor_batch_size_trigger_send`
- GRPC/HTTP server and client metrics are updated to be reported starting from `detailed` level:
- http.client.* metrics
- http.server.* metrics
- rpc.server.* metrics
- rpc.client.* metrics
2357865 to
ee3f4bc
Compare
…l guidelines (#12525) #### Description This PR: - requires "level: normal" before outputting batch processor metrics (in addition to one specific metric which was already restricted to "level: detailed") - clarifies wording in the telemetry level guidelines and documentation, and adds said guidelines to the requirements for stable components. Some rationale for these changes can be found in the tracking issue and [this comment](#7890 (comment)). #### Link to tracking issue Resolves #7890 #### To be discussed Should we add a feature gate for this, in case a user relies on "level: basic" outputting batch processor metrics? This feels like a niche use case, so considering the "alpha" stability level of these metrics, I don't think it's really necessary. Considering batch processor metrics had already been switched to "normal" once (#9767), but were turned back to basic at some later point (not sure when), we might also want to add tests to avoid further regressions (especially as the handling of telemetry levels is bound to change further with #11754). --------- Co-authored-by: Dmitrii Anoshin <[email protected]>
Description:
This change distributes the reported internal metrics across available levels and updates the level set by default:
The default level is changed from
basictonormal, which can be overridden withservice::telmetry::metrics::levelconfiguration.The following batch processor metrics are updated to be reported starting from
normallevel instead ofbasiclevel:processor_batch_batch_send_sizeprocessor_batch_metadata_cardinalityprocessor_batch_timeout_trigger_sendprocessor_batch_size_trigger_senddetailedlevel:http.client.*metricshttp.server.*metricsrpc.server.*metricsrpc.client.*metricsLink to tracking Issue: #7890