Skip to content

rename slo-aware-router to predicted-latency#2183

Merged
k8s-ci-robot merged 11 commits intokubernetes-sigs:mainfrom
tomatillo-and-multiverse:main
Jan 21, 2026
Merged

rename slo-aware-router to predicted-latency#2183
k8s-ci-robot merged 11 commits intokubernetes-sigs:mainfrom
tomatillo-and-multiverse:main

Conversation

@kaushikmitr
Copy link
Copy Markdown
Contributor

@kaushikmitr kaushikmitr commented Jan 19, 2026

This pull request replaces the old SLO-aware router plugin with a new predicted latency plugin in the scheduling framework. The main focus is on renaming, refactoring, and updating all relevant code and registration logic to use the new predicted_latency plugin, ensuring consistency across the codebase.

Plugin replacement and registration:

  • The SLO-aware router plugin (slo_aware_router) is replaced with the predicted latency plugin (predicted_latency) in the plugin registration logic within runner.go. All imports and registrations now reference the new plugin. [1] [2]

Refactoring and renaming for consistency:

  • All files and internal identifiers previously named for slo_aware_router are renamed to predicted_latency, including package names, struct names, and variable names (e.g., SLOAwareRouterPredictedLatency, sloRequestContextpredictedLatencyCtx). [1] [2] [3] [4] [5] [6] [7]

Function and variable updates:

  • All function signatures and internal logic are updated to use the new types and variable names, ensuring that context, metrics, and prediction logic refer to predictedLatencyCtx and related identifiers instead of the old SLO-aware router names. [1] [2] [3] [4] [5]

These changes modernize the latency prediction plugin and ensure naming consistency throughout the codebase.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 19, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Jan 19, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 0c05adf
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6970b77ec9dd48000862ed89
😎 Deploy Preview https://deploy-preview-2183--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 19, 2026
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 19, 2026
@kaushikmitr
Copy link
Copy Markdown
Contributor Author

kaushikmitr commented Jan 19, 2026

related to #2032 Once this merges will update the docs in a follow up PR.

Copy link
Copy Markdown
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we need to change tpot to itl? is itl the more popular term now? note that we use normalized_time_per_output_token in one of the metrics we currently report.

It would have been easier to review this PR if we did the slo -> predicted latency rename and then follow that up with the tpot -> itl.

itlSLOHeaderKey = "x-slo-itl-ms"

// Sheddable header string
sheddableHeaderKey = "x-request-sheddable"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this?


// parseBoolHeader retrieves a header by name, parses it as a bool,
// and returns the value or an error if the header is missing or invalid.
func parseBoolHeader(request schedulingtypes.LLMRequest, headerName string) (bool, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this? if this is renaming PR, no new logic should be expected in this PR.

limitations under the License.
*/

package predicted_latency
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is a net new file with net new logic, is this related to the renaming?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it was some shedding related changes i was working on and does not belong ot this PR. I removed those. Now this PR just renames slo-aware-router to predicted-latency

@kaushikmitr
Copy link
Copy Markdown
Contributor Author

Why did we need to change tpot to itl? is itl the more popular term now? note that we use normalized_time_per_output_token in one of the metrics we currently report.

It would have been easier to review this PR if we did the slo -> predicted latency rename and then follow that up with the tpot -> itl.

Yes let me split this PR into 2 focusing first on renaming the scorer. There is a subtle difference between itl and TPOT as reported by vLLM (ITL is per-token, and TPOT is request-weighted average of ITL). What we are measuring and optimizing for in the model is ITL.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 21, 2026
@kaushikmitr kaushikmitr changed the title rename slo-aware-router to predicted-latency and tpot to itl rename slo-aware-router to predicted-latency Jan 21, 2026
targetPod := sloCtx.targetMetadata
prefix_cache_score := sloCtx.prefixCacheScoresForEndpoints[targetPod.String()]
targetPod := predictedLatencyCtx.targetMetadata
prefix_cache_score := predictedLatencyCtx.prefixCacheScoresForEndpoints[targetPod.NamespacedName.Name]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a bug?

We should use targetPod.NamespacedName.String(), a pod name is not unique across namespaces, but we can do as a followup in a separate PR since we should do this in all places where the prefix cache is used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this was a bug, targetPod.String() broke the prefix cache scoring logic in the latency predictor. I think it happened during the renaming of pods to endpoints (not sure, need to look back at the history)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have a gap in tests then that we should address in a followup PR; pls also update the key to use the full namespaced name in another follow up PR


// sheddable indicates if the request can be shed if no valid endpoint is available.
sheddable bool

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Jan 21, 2026

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, kaushikmitr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2026
@k8s-ci-robot k8s-ci-robot merged commit 220b765 into kubernetes-sigs:main Jan 21, 2026
11 checks passed
elevran pushed a commit to llm-d/llm-d-inference-scheduler that referenced this pull request Apr 23, 2026
…-api-inference-extension#2183)

* rename slo-aware-router to predicted-latency and tpot to itl

* fix fmt error

* remove unused files

* fix test

* fix lint errors

* remove tpot renaming

* remmove shedding related changes

* remmove shedding related changes 2

* fix sidecar renaming

* removed sheddable var
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants