SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment#1839
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @BenjaminBraunDev. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
2c56616 to
f63bc01
Compare
d177545 to
ddee4c7
Compare
ahg-g
left a comment
There was a problem hiding this comment.
it would be great if you can send a separate PR for adding the running requests metric
|
@kaushikmitr @BenjaminBraunDev this is predicted latency, not slo, right? if so, please use |
…, add predictor to new 2 phase configuration parser
…n, running routines there, move predictor helm section into new tpl file, rename slo-aware-routing guide and names in docs
…cars not fail immediatly during EPP spinup
b94d598 to
1eb5d8a
Compare
|
Here's the issue regarding the |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BenjaminBraunDev, kfswain The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@kfswain Sorry, forgot to add a small bugfix to the scorer plugin, could you re-approve? |
|
Np, approval remains. It's the LGTM that's removed. I'll leave that to @kaushikmitr to stamp |
|
Sry, I wont make you chase down a person again, was just trying to let other people be involved is all. /lgtm |
|
Thanks! |
kubernetes-sigs/gateway-api-inference-extension#1839) * Add latency predictor plugins, deployment, and runner.go integration * Update dockerfile, fix issues with SLO context not being set when prediciton id off * Remove outdated inferencepool-resources deployment * Fix streamed request being called one final time after request complete, add predictor check to the beginning of each requestcontrol hook * add guide, update helm charts and readme, minor scorer changes * Make small guide update * Add helm values and polish README and SLO routing guide * Clean up errors from rebase, add running request metric to datasource, add predictor to new 2 phase configuration parser * Fix epp image and add placeholder docker repos for latency sidecars * Update guide, README, and values.yaml * Moved predictor setup logic into plugin * Move predictor startup login completely out of manager and into plugin, running routines there, move predictor helm section into new tpl file, rename slo-aware-routing guide and names in docs * Remove max-score-picker from list of plugin types in helm chart * Fix formatting * Revert go.mod to main * Fix typo in config, remove depreicated runtime flag * Rename latency prediction plugins, change docs accordingly, make sidecars not fail immediatly during EPP spinup * Update docs with new total running requests metric * Small plugin bugfix
This PR is stage 3/3 for adding in the latency prediction and SLO-Aware Routing functionality to EPP.
New Features:
-enable-latency-predictorflag in EPP arg to inform it that sidecars are present and to register slo routing plugins.x-slo-ttft-msandx-slo-tpot-ms) and a boolean for whether to use the SLO routing scheduling profile with slo scoring (x-prediction-based-scheduling). If false, use the default profile and just track and train for future requests.Plugins
Registers and deploys the plugins added in #1849 via scheduling profiles:
PodMetrics
Adds (back) the
totalRunningRequestsMetricprometheus metric from vLLM, which was removed for being unused in the past, but is now a feature of our latency prediction model.Guide
Added a guide for how to deploy IGW with SLO-Aware Routing in site-src/guides/slo-aware-routing.md
Fixes #1323
Does this PR introduce a user-facing change?: