Training server ensemble#2473
Merged
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom Mar 5, 2026
Merged
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Contributor
|
/lgtm |
a4ee705 to
5486d30
Compare
Contributor
Author
Contributor
Author
|
actually it seems someone already added it, just need to rebase once it lands. #2484 |
5486d30 to
a5e6642
Compare
Contributor
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BenjaminBraunDev, kaushikmitr The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Contributor
|
/lgtm |
RyanRosario
pushed a commit
to RyanRosario/gateway-api-inference-extension
that referenced
this pull request
Mar 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new ensemble (gated) model training approach to the latency predictor, enabling the system to train and save separate models for "noqueue" and "queued" regimes, and wraps them into a single serializable object for easier deployment and prediction logic. The changes also include configuration options for ensemble mode, new metrics, and API support for downloading and listing these ensemble models.
Ensemble Model Training and Management
QueueGatedModelclass. The ensemble is activated only if sufficient samples exist for all sub-models. [1] [2] [3]ttft_gated.joblib,tpot_gated.joblib) and included in the model listing, download, and info API endpoints. [1] [2] [3] [4]Configuration and Settings
Settingsfor enabling ensemble mode, specifying minimum samples for ensemble split, and setting paths for gated ensemble model files. The default maximum training data size per bucket was also reduced.Metrics and Monitoring
API Enhancements
/data_statusAPI now reports sample counts for each regime and ensemble configuration details, aiding monitoring and debugging of the ensemble training process. [1] [2]These changes collectively enable more robust and flexible latency prediction by accounting for queueing effects, and improve observability and manageability of the model lifecycle.