Skip to content

Conversation

@nirrozenbaum
Copy link
Contributor

following agreements made on #905,
this PR implements another step towards the end goal of the scheduler new design. main changes include:

  • rename of the profile handling plugin to ProfileHandler and add the ProcessProfilesResults extension point (function called ProcessResults in ProfileHandler interface, as agreed in the design discussion).
  • add SchedulingResult which includes the map of the profile runs results and the key of the primary profile that is used in the director code to set the destination header (in the future it can be changed to use a PreRequest plugin).
  • add the ProcessResults call in the scheduler request flow.
  • create CycleState per request and not per profile.
  • add a DEPRECATED comment on PostCycle. once PostResponse is used in prefix plugin, we should remove PostCycle.
  • updated all tests according to the above changes.

@netlify
Copy link

netlify bot commented Jun 8, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit ef498b9
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6847f5e6097cab0008a6c742
😎 Deploy Preview https://deploy-preview-937--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot requested review from Jeffwan and robscott June 8, 2025 15:34
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 8, 2025
@nirrozenbaum
Copy link
Contributor Author

cc @ahg-g @elevran @kfswain

@nirrozenbaum nirrozenbaum changed the title implement agreed points on scheduler redesign scheduler redesign Jun 9, 2025
@nirrozenbaum nirrozenbaum changed the title scheduler redesign scheduler redesign continuation Jun 9, 2025
Pick(ctx context.Context, request *types.LLMRequest, profiles map[string]*SchedulerProfile, executionResults map[string]*types.Result) map[string]*SchedulerProfile
// Pick selects the SchedulingProfiles to run from a list of candidate profiles, while taking into consideration the request properties
// and the previously executed SchedluderProfile cycles along with their results.
Pick(ctx context.Context, request *types.LLMRequest, profiles map[string]*SchedulerProfile, profileResults map[string]*types.ProfileRunResult) map[string]*SchedulerProfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we need a different name so we don't confuse it with the endpoint picker plugin. One suggestion is Select.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, I'd like to be consistent in terminology.
if we use Pick function in the Picker interface, keeping consistency here means having Pick function in the ProfileHandler interface.
I'm ok with changing it to Select but in such a case I'd consider changing in all places to keep it consistent -
e.g., rename Picker to Selector and rename Pick function to Select in both places.

LMKWYT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was trying to address is this: if I am searching the code for pick, it would be better if most of the results are unambiguous and referred to one thing, which is endpoint picking for example, although I have to admit select is also generic (could mean label selection). I am fine to keep it as pick, just sharing a thought I had.

ScorerPluginType = "Scorer"
PickerPluginType = "Picker"
PostCyclePluginType = "PostCycle"
ProfilePickerType = "ProfilePicker"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to handler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it's actually the extension point name and not the plugin name -
we want to record a metric for how much time it takes to perform profile picking and how much time to perform ProcessProfilesResults.
This is why the list contains both (please ignore PostCycle which will be removed as soon as possible):

const (
	ProfilePickerType          = "ProfilePicker"
	FilterPluginType           = "Filter"
	ScorerPluginType           = "Scorer"
	PickerPluginType           = "Picker"
	PostCyclePluginType        = "PostCycle"
	ProcessProfilesResultsType = "ProcessProfilesResults"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg

type Result struct {
// ProfileRunResult captures the profile run result.
type ProfileRunResult struct {
TargetPod Pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is supposed to be a list, or is that planned in a followup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. it should be a list but is not addressed in this PR which focused on ProfileHandler changes and adding the ProcessResult extension point.
will be addressed in a follow up.

Signed-off-by: Nir Rozenbaum <[email protected]>
@ahg-g
Copy link
Contributor

ahg-g commented Jun 10, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 10, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 10, 2025
@k8s-ci-robot k8s-ci-robot merged commit e958cbd into kubernetes-sigs:main Jun 10, 2025
7 of 8 checks passed
@nirrozenbaum nirrozenbaum deleted the profile-handler branch June 11, 2025 03:23
rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request Jun 11, 2025
* implement agreed points on scheduler redesign

Signed-off-by: Nir Rozenbaum <[email protected]>

* addressed code review comments

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants