-
Notifications
You must be signed in to change notification settings - Fork 283
SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment #1839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
k8s-ci-robot
merged 19 commits into
kubernetes-sigs:main
from
BenjaminBraunDev:slo-aware-routing-stage-3
Nov 26, 2025
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
ee4a8c3
Add latency predictor plugins, deployment, and runner.go integration
BenjaminBraunDev cb88003
Update dockerfile, fix issues with SLO context not being set when pre…
BenjaminBraunDev 42754f5
Remove outdated inferencepool-resources deployment
BenjaminBraunDev dcab27d
Fix streamed request being called one final time after request comple…
BenjaminBraunDev cce6421
add guide, update helm charts and readme, minor scorer changes
BenjaminBraunDev 32aec98
Make small guide update
BenjaminBraunDev d7fd091
Add helm values and polish README and SLO routing guide
BenjaminBraunDev 07f7430
Clean up errors from rebase, add running request metric to datasource…
BenjaminBraunDev 1e57cf6
Fix epp image and add placeholder docker repos for latency sidecars
BenjaminBraunDev 979ecb9
Update guide, README, and values.yaml
BenjaminBraunDev 6c73dd2
Moved predictor setup logic into plugin
BenjaminBraunDev 6dc8b02
Move predictor startup login completely out of manager and into plugi…
BenjaminBraunDev e4b8eec
Remove max-score-picker from list of plugin types in helm chart
BenjaminBraunDev 415d93f
Fix formatting
BenjaminBraunDev e0bf3b0
Revert go.mod to main
BenjaminBraunDev 5844d8f
Fix typo in config, remove depreicated runtime flag
BenjaminBraunDev 1eb5d8a
Rename latency prediction plugins, change docs accordingly, make side…
BenjaminBraunDev e0afb89
Update docs with new total running requests metric
BenjaminBraunDev 5b853dd
Small plugin bugfix
BenjaminBraunDev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
112 changes: 112 additions & 0 deletions
112
config/charts/inferencepool/templates/_latency-predictor.tpl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| {{/* | ||
| Latency Predictor Env | ||
| */}} | ||
| {{- define "gateway-api-inference-extension.latencyPredictor.env" -}} | ||
| {{- if .Values.inferenceExtension.latencyPredictor.enabled }} | ||
| - name: PREDICTION_SERVER_URL | ||
| value: "{{- $count := int .Values.inferenceExtension.latencyPredictor.predictionServers.count -}} | ||
| {{- $startPort := int .Values.inferenceExtension.latencyPredictor.predictionServers.startPort -}} | ||
| {{- range $i := until $count -}} | ||
| {{- if $i }},{{ end }}http://localhost:{{ add $startPort $i }} | ||
| {{- end }}" | ||
| - name: TRAINING_SERVER_URL | ||
| value: "http://localhost:{{ .Values.inferenceExtension.latencyPredictor.trainingServer.port }}" | ||
| {{- range $key, $value := .Values.inferenceExtension.latencyPredictor.eppEnv }} | ||
| - name: {{ $key }} | ||
| value: {{ $value | quote }} | ||
| {{- end }} | ||
| {{- end }} | ||
| {{- end }} | ||
|
|
||
| {{/* | ||
| Latency Predictor Sidecar Containers | ||
| */}} | ||
| {{- define "gateway-api-inference-extension.latencyPredictor.containers" -}} | ||
| {{- if .Values.inferenceExtension.latencyPredictor.enabled }} | ||
| # Training Server Sidecar Container | ||
| - name: training-server | ||
| image: {{ .Values.inferenceExtension.latencyPredictor.trainingServer.image.hub }}/{{ .Values.inferenceExtension.latencyPredictor.trainingServer.image.name }}:{{ .Values.inferenceExtension.latencyPredictor.trainingServer.image.tag }} | ||
| imagePullPolicy: {{ .Values.inferenceExtension.latencyPredictor.trainingServer.image.pullPolicy }} | ||
| ports: | ||
| - containerPort: {{ .Values.inferenceExtension.latencyPredictor.trainingServer.port }} | ||
| name: training-port | ||
| livenessProbe: | ||
| {{- toYaml .Values.inferenceExtension.latencyPredictor.trainingServer.livenessProbe | nindent 4 }} | ||
| readinessProbe: | ||
| {{- toYaml .Values.inferenceExtension.latencyPredictor.trainingServer.readinessProbe | nindent 4 }} | ||
| resources: | ||
| {{- toYaml .Values.inferenceExtension.latencyPredictor.trainingServer.resources | nindent 4 }} | ||
| envFrom: | ||
| - configMapRef: | ||
| name: {{ include "gateway-api-inference-extension.name" . }}-latency-predictor-training | ||
| env: | ||
| - name: POD_NAME | ||
| valueFrom: | ||
| fieldRef: | ||
| fieldPath: metadata.name | ||
| - name: SERVER_TYPE | ||
| value: "training" | ||
| volumeMounts: | ||
| - name: training-server-storage | ||
| mountPath: /models | ||
| {{- range $i := until (int .Values.inferenceExtension.latencyPredictor.predictionServers.count) }} | ||
| # Prediction Server Sidecar Container {{ add $i 1 }} | ||
| - name: prediction-server-{{ add $i 1 }} | ||
| image: {{ $.Values.inferenceExtension.latencyPredictor.predictionServers.image.hub }}/{{ $.Values.inferenceExtension.latencyPredictor.predictionServers.image.name }}:{{ $.Values.inferenceExtension.latencyPredictor.predictionServers.image.tag }} | ||
| imagePullPolicy: {{ $.Values.inferenceExtension.latencyPredictor.predictionServers.image.pullPolicy }} | ||
| command: ["uvicorn"] | ||
| args: ["prediction_server:app", "--host", "0.0.0.0", "--port", "{{ add $.Values.inferenceExtension.latencyPredictor.predictionServers.startPort $i }}"] | ||
| ports: | ||
| - containerPort: {{ add $.Values.inferenceExtension.latencyPredictor.predictionServers.startPort $i }} | ||
| name: predict-port-{{ add $i 1 }} | ||
| livenessProbe: | ||
| httpGet: | ||
| path: {{ $.Values.inferenceExtension.latencyPredictor.predictionServers.livenessProbe.httpGet.path }} | ||
| port: {{ add $.Values.inferenceExtension.latencyPredictor.predictionServers.startPort $i }} | ||
| initialDelaySeconds: {{ $.Values.inferenceExtension.latencyPredictor.predictionServers.livenessProbe.initialDelaySeconds }} | ||
| periodSeconds: {{ $.Values.inferenceExtension.latencyPredictor.predictionServers.livenessProbe.periodSeconds }} | ||
| readinessProbe: | ||
| httpGet: | ||
| path: {{ $.Values.inferenceExtension.latencyPredictor.predictionServers.readinessProbe.httpGet.path }} | ||
| port: {{ add $.Values.inferenceExtension.latencyPredictor.predictionServers.startPort $i }} | ||
| initialDelaySeconds: {{ $.Values.inferenceExtension.latencyPredictor.predictionServers.readinessProbe.initialDelaySeconds }} | ||
| periodSeconds: {{ $.Values.inferenceExtension.latencyPredictor.predictionServers.readinessProbe.periodSeconds }} | ||
| failureThreshold: {{ $.Values.inferenceExtension.latencyPredictor.predictionServers.readinessProbe.failureThreshold }} | ||
| resources: | ||
| {{- toYaml $.Values.inferenceExtension.latencyPredictor.predictionServers.resources | nindent 4 }} | ||
| envFrom: | ||
| - configMapRef: | ||
| name: {{ include "gateway-api-inference-extension.name" $ }}-latency-predictor-prediction | ||
| env: | ||
| - name: PREDICT_PORT | ||
| value: "{{ add $.Values.inferenceExtension.latencyPredictor.predictionServers.startPort $i }}" | ||
| - name: POD_NAME | ||
| valueFrom: | ||
| fieldRef: | ||
| fieldPath: metadata.name | ||
| - name: SERVER_TYPE | ||
| value: "prediction-{{ add $i 1 }}" | ||
| - name: TRAINING_SERVER_URL | ||
| value: "http://localhost:{{ $.Values.inferenceExtension.latencyPredictor.trainingServer.port }}" | ||
| volumeMounts: | ||
| - name: prediction-server-{{ add $i 1 }}-storage | ||
| mountPath: /server_models | ||
| {{- end }} | ||
| {{- end }} | ||
| {{- end }} | ||
|
|
||
| {{/* | ||
| Latency Predictor Volumes | ||
| */}} | ||
| {{- define "gateway-api-inference-extension.latencyPredictor.volumes" -}} | ||
| {{- if .Values.inferenceExtension.latencyPredictor.enabled }} | ||
| - name: training-server-storage | ||
| emptyDir: | ||
| sizeLimit: {{ .Values.inferenceExtension.latencyPredictor.trainingServer.volumeSize }} | ||
| {{- range $i := until (int .Values.inferenceExtension.latencyPredictor.predictionServers.count) }} | ||
| - name: prediction-server-{{ add $i 1 }}-storage | ||
| emptyDir: | ||
| sizeLimit: {{ $.Values.inferenceExtension.latencyPredictor.predictionServers.volumeSize }} | ||
| {{- end }} | ||
| {{- end }} | ||
| {{- end }} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.