-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[telemetry] emit metrics with _ instead of / #9775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 5 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
f0e6c50
[telemetry] emit metrics with . instead of /
codeboten 3a2c08f
changelog
codeboten 2530f58
Merge branch 'main' into codeboten/use-period
codeboten 3ef2752
update var name
codeboten 393538b
Merge branch 'main' into codeboten/use-period
codeboten 483643f
update . to _
codeboten File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| # Use this changelog template to create an entry for release notes. | ||
|
|
||
| # One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' | ||
| change_type: breaking | ||
|
|
||
| # The name of the component, or a single word describing the area of concern, (e.g. otlpreceiver) | ||
| component: service | ||
|
|
||
| # A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). | ||
| note: emit internal collector metrics with . instead of / with OTLP export | ||
|
|
||
| # One or more tracking issues or pull requests related to the change | ||
| issues: [9774] | ||
|
|
||
| # (Optional) One or more lines of additional information to render under the primary note. | ||
| # These lines will be padded with 2 spaces and then inserted directly into the document. | ||
| # Use pipe (|) for multiline entries. | ||
| subtext: | | ||
| This is addressing an issue w/ the names of the metrics generated by the Collector for its | ||
| internal metrics. Note that this change only impacts users that emit telemetry using OTLP, which | ||
| is currently still in experimental support. The prometheus metrics already replaced `/` with `_` | ||
| and they will do the same with `.`. | ||
|
|
||
| # Optional: The change log or logs in which this entry should be included. | ||
| # e.g. '[user]' or '[user, api]' | ||
| # Include 'user' if the change is relevant to end users. | ||
| # Include 'api' if there is a change to a library API. | ||
| # Default: '[user]' | ||
| change_logs: [] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,5 +18,5 @@ const ( | |
| ) | ||
|
|
||
| var ( | ||
| ProcessorPrefix = ProcessorKey + NameSep | ||
| ProcessorMetricPrefix = ProcessorKey + MetricNameSep | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be a breaking change for end users who scrape the metrics with prometheus right? But it will be a breaking change for anyone using the otel go sdk to emit metrics right? I am worried about breaking end-user telemetry in a single release, should we add a feature gate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't OTel Go SDK support under a feature gate already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, but I'm always worried changing telemetry names bc of the impact to alerting. I believe our policy for alpha/beta feature gates is that we can change the feature without a breaking change. If we chose to do that we could make the release notes super clear and discoverable what we broke.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to tyler's concern here. By making the separator here a
.instead of a_we would then create three possibilities for users today:Rather than creating a third potential for this metric name, if the separator were
_we would simply have this state:Creating a third state disincentivizes users from sending OTLP (because it would break their dashboards/alerts) or upgrading their collector further for the same reason.
We already do not follow the conventions mentioned here, see naming for time and utilization. Prometheus currently rules the metric world for metric naming, independently of the recommendations made by our own semantic convention today. Given that, I think for legacy conversions we should always be opting for not breaking users by following the existing convention. We should reconsider this once semantic convention exists for not only metric naming recommendations (when to use a . or a _), but also for pipeline monitoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right the problem is see with using
_, is that today, changing a collector from emitting metrics using prometheus to OTLP has a problem with metrics we don't control.As an example, the metric for grpc instrumentation changes from
to:
This is because the prometheus metric replaces
.with_(as it does for other collector metrics/). Replacing the separator with_instead of.means users will still have to contend with a broken metric, it's just not clear which metrics will be broken to them, as they may not know what constitutes an instrumentation metric vs a collector metric. I opted to go with.as it at least aligns with the opentelemetry semantic convention.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that breaking the name is going to disincentivize users, but I don't think we want to hold ourselves to prometheus naming standards when OTel has its own naming standard.
I can see an argument for waiting until more semantic conventions are stabilized before doing the breaking change to require only 1 breaking change for our end users.
I am ok keeping this change behind the
useOtelWithSDKConfigurationForInternalTelemetrysince it is still alpha.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah my other concern is that by changing the convention for the collector's own metrics, we actually invalidate any documents for stable/beta components that mention existing metrics. these documents mention the important metrics for various components, none mentioning that these are "prometheus convention" but rather that they are the important metrics for monitoring your collector. By creating a dual state, all of these documents would need to be edited as part of this change and then edited again upon guidance from the OTEP.
Similarly, I would argue that by not keeping the metrics consistent we are introducing a bug to the collector – we would not be able to make the
useOtelWithSDKConfigurationForInternalTelemetryflag beta because then anyone emitting metrics via OTLP would see incorrect metric names for components like the batch processor.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codeboten is there a world where we handle switch to the OTEL SDK in 2 steps:
useOtelWithSDKConfigurationForInternalTelemetrytoStablewithout breaking the emitted metric names:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i see two problems and maybe it's easier to reason with a decision to move forward if we think about them separately.
1. Metrics controlled by the collector
These are the metrics that come from instruments instantiated in the collector itself. An example of such a metric is
processor/batch/metadata_cardinality. This metric in the prometheus export is generated with theotelcol_prefix and all the/are replaced with_resulting in a metric named:otelcol_processor_batch_metadata_cardinality.These metrics are currently being emitted one way via the prometheus export, and another via OTLP export:
After this change, it would be emitted like (which aligns more closely to otel conventions):
The alternative would be to emit this as
Ignoring the missing prefix as that's done in a separate change, this would remain consistent with the prometheus naming that exists today.
2. Metrics generated by instrumentation
These are metrics we don't control as they're the result of instrumentation libraries for which we have limited control (outside of using views). These metrics currently look like this:
After this change, nothing changes in the metric itself, as there is no configuration for the separator or prefix for the OTLP exporter.
My proposal was to align both the 1 & 2 more closely to prevent users having some of their metrics continue to work but not all of them. I thought that putting this behind an experimental flag would allow users to make a change across all the metrics generated by the collector at the same time, reducing the chance that only some of their metrics work.
That being said, I can also see the benefit from having the metrics under the collector's control not change multiple times. But as an end-user, it may not be clear why only instrumentation library metrics changed but not all of them.
I'm happy to discuss this at tomorrow's SIG call if it makes sense @jaronoff97 @TylerHelmuth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya let's discuss some more at the SIG