rename slo-aware-router to predicted-latency by kaushikmitr · Pull Request #2183 · kubernetes-sigs/gateway-api-inference-extension

kaushikmitr · 2026-01-19T19:25:36Z

This pull request replaces the old SLO-aware router plugin with a new predicted latency plugin in the scheduling framework. The main focus is on renaming, refactoring, and updating all relevant code and registration logic to use the new predicted_latency plugin, ensuring consistency across the codebase.

Plugin replacement and registration:

The SLO-aware router plugin (slo_aware_router) is replaced with the predicted latency plugin (predicted_latency) in the plugin registration logic within runner.go. All imports and registrations now reference the new plugin. [1] [2]

Refactoring and renaming for consistency:

All files and internal identifiers previously named for slo_aware_router are renamed to predicted_latency, including package names, struct names, and variable names (e.g., SLOAwareRouter → PredictedLatency, sloRequestContext → predictedLatencyCtx). [1] [2] [3] [4] [5] [6] [7]

Function and variable updates:

All function signatures and internal logic are updated to use the new types and variable names, ensuring that context, metrics, and prediction logic refer to predictedLatencyCtx and related identifiers instead of the old SLO-aware router names. [1] [2] [3] [4] [5]

These changes modernize the latency prediction plugin and ensure naming consistency throughout the codebase.

netlify · 2026-01-19T19:25:45Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`0c05adf`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6970b77ec9dd48000862ed89
😎 Deploy Preview	https://deploy-preview-2183--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

kaushikmitr · 2026-01-19T19:44:40Z

related to #2032 Once this merges will update the docs in a follow up PR.

ahg-g

Why did we need to change tpot to itl? is itl the more popular term now? note that we use normalized_time_per_output_token in one of the metrics we currently report.

It would have been easier to review this PR if we did the slo -> predicted latency rename and then follow that up with the tpot -> itl.

ahg-g · 2026-01-20T10:50:24Z

+	itlSLOHeaderKey = "x-slo-itl-ms"
+
+	// Sheddable header string
+	sheddableHeaderKey = "x-request-sheddable"


what is this?

ahg-g · 2026-01-20T10:52:06Z

+
+// parseBoolHeader retrieves a header by name, parses it as a bool,
+// and returns the value or an error if the header is missing or invalid.
+func parseBoolHeader(request schedulingtypes.LLMRequest, headerName string) (bool, error) {


why do we need this? if this is renaming PR, no new logic should be expected in this PR.

ahg-g · 2026-01-20T10:55:20Z

+limitations under the License.
+*/
+
+package predicted_latency


It seems this is a net new file with net new logic, is this related to the renaming?

yes, it was some shedding related changes i was working on and does not belong ot this PR. I removed those. Now this PR just renames slo-aware-router to predicted-latency

kaushikmitr · 2026-01-20T13:48:45Z

Why did we need to change tpot to itl? is itl the more popular term now? note that we use normalized_time_per_output_token in one of the metrics we currently report.

It would have been easier to review this PR if we did the slo -> predicted latency rename and then follow that up with the tpot -> itl.

Yes let me split this PR into 2 focusing first on renaming the scorer. There is a subtle difference between itl and TPOT as reported by vLLM (ITL is per-token, and TPOT is request-weighted average of ITL). What we are measuring and optimizing for in the model is ITL.

ahg-g · 2026-01-21T10:22:48Z

-	targetPod := sloCtx.targetMetadata
-	prefix_cache_score := sloCtx.prefixCacheScoresForEndpoints[targetPod.String()]
+	targetPod := predictedLatencyCtx.targetMetadata
+	prefix_cache_score := predictedLatencyCtx.prefixCacheScoresForEndpoints[targetPod.NamespacedName.Name]


Was this a bug?

We should use targetPod.NamespacedName.String(), a pod name is not unique across namespaces, but we can do as a followup in a separate PR since we should do this in all places where the prefix cache is used.

yes this was a bug, targetPod.String() broke the prefix cache scoring logic in the latency predictor. I think it happened during the renaming of pods to endpoints (not sure, need to look back at the history)

So we have a gap in tests then that we should address in a followup PR; pls also update the key to use the full namespaced name in another follow up PR

ahg-g · 2026-01-21T10:24:48Z


+	// sheddable indicates if the request can be shed if no valid endpoint is available.
+	sheddable bool
+


ahg-g · 2026-01-21T11:33:24Z

/approve
/lgtm

k8s-ci-robot · 2026-01-21T11:33:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, kaushikmitr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…-api-inference-extension#2183) * rename slo-aware-router to predicted-latency and tpot to itl * fix fmt error * remove unused files * fix test * fix lint errors * remove tpot renaming * remmove shedding related changes * remmove shedding related changes 2 * fix sidecar renaming * removed sheddable var

rename slo-aware-router to predicted-latency and tpot to itl

67b7feb

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 19, 2026

k8s-ci-robot requested review from ahg-g and robscott January 19, 2026 19:25

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 19, 2026

Resolved merge conflicts and merged feature branch

4a19b11

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 19, 2026

kaushikmitr added 4 commits January 19, 2026 19:51

fix fmt error

fe70fb8

remove unused files

75fac57

fix test

8158d6c

fix lint errors

964ae37

ahg-g reviewed Jan 20, 2026

View reviewed changes

kaushikmitr added 3 commits January 21, 2026 08:59

remove tpot renaming

d196b5e

remmove shedding related changes

f741260

remmove shedding related changes 2

bb8f453

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 21, 2026

kaushikmitr changed the title ~~rename slo-aware-router to predicted-latency and tpot to itl~~ rename slo-aware-router to predicted-latency Jan 21, 2026

fix sidecar renaming

a5664d8

ahg-g reviewed Jan 21, 2026

View reviewed changes

removed sheddable var

0c05adf

k8s-ci-robot assigned ahg-g Jan 21, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2026

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2026

k8s-ci-robot merged commit 220b765 into kubernetes-sigs:main Jan 21, 2026
11 checks passed


		// sheddable indicates if the request can be shed if no valid endpoint is available.
		sheddable bool

Conversation

kaushikmitr commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

kaushikmitr commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahg-g left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaushikmitr commented Jan 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahg-g commented Jan 21, 2026

Uh oh!

k8s-ci-robot commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kaushikmitr commented Jan 19, 2026 •

edited

Loading

netlify Bot commented Jan 19, 2026 •

edited

Loading

kaushikmitr commented Jan 19, 2026 •

edited

Loading