Skip to content

Conversation

@dmitryax
Copy link
Member

@dmitryax dmitryax commented Mar 14, 2024

Description:

This change distributes the reported internal metrics across available levels and updates the level set by default:

  1. The default level is changed from basic to normal, which can be overridden with service::telmetry::metrics::level configuration.

  2. The following batch processor metrics are updated to be reported starting from normal level instead of basic level:

  • processor_batch_batch_send_size
  • processor_batch_metadata_cardinality
  • processor_batch_timeout_trigger_send
  • processor_batch_size_trigger_send
  1. The following GRPC/HTTP server and client metrics are updated to be reported starting from detailed level:
  • http.client.* metrics
  • http.server.* metrics
  • rpc.server.* metrics
  • rpc.client.* metrics

Link to tracking Issue: #7890

@dmitryax dmitryax requested review from a team and Aneurysm9 March 14, 2024 23:21
@dmitryax dmitryax force-pushed the move-internal-metrics-to-normal branch 2 times, most recently from 7fc0d50 to d2068ac Compare March 14, 2024 23:23
@codecov
Copy link

codecov bot commented Mar 14, 2024

Codecov Report

Attention: Patch coverage is 65.21739% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 91.73%. Comparing base (e8cabb7) to head (ee3f4bc).

Files Patch % Lines
config/configgrpc/configgrpc.go 0.00% 2 Missing and 2 partials ⚠️
config/confighttp/confighttp.go 60.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9767      +/-   ##
==========================================
- Coverage   91.79%   91.73%   -0.06%     
==========================================
  Files         358      358              
  Lines       16576    16584       +8     
==========================================
- Hits        15216    15214       -2     
- Misses       1037     1042       +5     
- Partials      323      328       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dmitryax dmitryax force-pushed the move-internal-metrics-to-normal branch from d2068ac to 2218e9b Compare March 15, 2024 06:20
Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this interact with #9510 (comment) ?

I think we should make a decision on whether to remove Metrics Level and if so what's our plan until we remove it

@mx-psi mx-psi requested review from bogdandrutu and codeboten March 15, 2024 11:05
@dmitryax
Copy link
Member Author

dmitryax commented Mar 15, 2024

I'm ok with having other configs for that, like views. AFAIU, this isn't possible even with telemetry.useOtelWithSDKConfigurationForInternalTelemetry. @codeboten, please correct me if I am wrong. If there are other plans to introduce this functionality, I'd like to help

@dmitryax dmitryax changed the title Distributed internal metrics across different levels Distribute internal metrics across different levels Mar 15, 2024
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok, just one question

meter metric.Meter
)

// BatchProcessor metrics are not subject to the same level of filtering as other components.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this true for all components? If the default is normal, aren't all metrics (other than instrumentation library metrics) emitted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this true for all components?

Yes, it's true for all components. With this comment, I wanted to say that the batch processor is the only component in core emitting metrics (with the normal level). It's probably confusing. Let me update it

If the default is normal, aren't all metrics (other than instrumentation library metrics) emitted?

Right

The internal metrics levels are updated along with reported metrics:
    - The default level is changed from `basic` to `normal`, which can be overridden with `service::telmetry::metrics::level` configuration.
    - Batch processor metrics are updated to be reported starting from `normal` level:
      - `processor_batch_batch_send_size`
      - `processor_batch_metadata_cardinality`
      - `processor_batch_timeout_trigger_send`
      - `processor_batch_size_trigger_send`
    - GRPC/HTTP server and client metrics are updated to be reported starting from `detailed` level:
      - http.client.* metrics
      - http.server.* metrics
      - rpc.server.* metrics
      - rpc.client.* metrics
@dmitryax dmitryax merged commit 670c12d into open-telemetry:main Apr 16, 2024
@github-actions github-actions bot added this to the next release milestone Apr 16, 2024
@dmitryax dmitryax deleted the move-internal-metrics-to-normal branch May 2, 2024 04:17
github-merge-queue bot pushed a commit that referenced this pull request Mar 4, 2025
…l guidelines (#12525)

#### Description

This PR:
- requires "level: normal" before outputting batch processor metrics (in
addition to one specific metric which was already restricted to "level:
detailed")
- clarifies wording in the telemetry level guidelines and documentation,
and adds said guidelines to the requirements for stable components.

Some rationale for these changes can be found in the tracking issue and
[this
comment](#7890 (comment)).

#### Link to tracking issue
Resolves #7890

#### To be discussed

Should we add a feature gate for this, in case a user relies on "level:
basic" outputting batch processor metrics? This feels like a niche use
case, so considering the "alpha" stability level of these metrics, I
don't think it's really necessary.

Considering batch processor metrics had already been switched to
"normal" once (#9767), but were turned back to basic at some later point
(not sure when), we might also want to add tests to avoid further
regressions (especially as the handling of telemetry levels is bound to
change further with #11754).

---------

Co-authored-by: Dmitrii Anoshin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants