-
Notifications
You must be signed in to change notification settings - Fork 201
updated metrics and logging for plugins #1235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated metrics and logging for plugins #1235
Conversation
Signed-off-by: Nir Rozenbaum <[email protected]>
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
CC: @JeffLuoo |
|
/approve Looks great to me. Will let Jeff give final stamp. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kfswain, nirrozenbaum The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| SchedulerPluginProcessingLatencies = prometheus.NewHistogramVec( | ||
| prometheus.HistogramOpts{ | ||
| Subsystem: InferenceExtension, | ||
| Name: "scheduler_plugin_duration_seconds", | ||
| Help: metricsutil.HelpMsgWithStability("Scheduler plugin processing latency distribution in seconds for each plugin type and plugin name.", compbasemetrics.ALPHA), | ||
| Buckets: []float64{ | ||
| 0.0001, 0.0002, 0.0005, 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, | ||
| }, | ||
| }, | ||
| []string{"plugin_type", "plugin_name"}, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR LGTM'd but deleting an existing metric should be careful when we promote the metric to Beta in the future: https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-metric
The deprecated metric should be kept for 2 releases or 8 months. Since current metric is still Alpha so we are good with removing the metric directly. We need to track the progress of marking metrics as beta for stability of metrics. I will create a tracker for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JeffLuoo what is the expectation when adding new metrics? use alpha for a while and then promote to beta?
is there also a promotion from beta to "v1" or similar?
additionally, what is the required time to have a metric in alpha before promoting it to beta?
I expect more metrics to be added as we continue to make progress and I think a clear documentation of these points could be useful (e.g., metrics management guide).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1 in Kubernetes will be Stable. Hence the cycle will be Alpha -> Beta -> Stable.
There isn't rigid requirement to say a metric has to be promoted to next cycle in xx months. It's all about project owner to determine when to promote. Higher stability level will just mean less breaking change so external dependencies like alerts and dashboards can rely on it.
+1 that a "metrics management guide" is recommended to manage the lifecycle of the metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JeffLuoo would you be interested in taking an AI to create this metrics management guide?
JeffLuoo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Signed-off-by: Nir Rozenbaum <[email protected]>
This PR updates metrics and logging of the various extension points.
the following changes are included:
RecordSchedulerPlugin..andRecordRequestControlPlugin.... instead we now have justRecordPlugin.... there is no need to differentiate between the functions, the extension point is included.