Skip to content

Conversation

@shmuelk
Copy link
Collaborator

@shmuelk shmuelk commented Jun 29, 2025

This PR:

  • Adds factory functions for all plugins in the repository
  • Registers the above plugin factory functions
  • Some minor changes to some plugin contructors to eliminate use of environment variables in constructor

This PR completes steps three and four of issue #201

@shmuelk shmuelk requested review from elevran and mayabar June 29, 2025 11:16
@nirrozenbaum
Copy link
Collaborator

verifying - was this tested e2e?

)

type loadAwareScorerParameters struct {
Threshold float64 `json:"threshold"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be int

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The threshold here is a float64. It was read from an environment variable and converted to a float64

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry. saw in the old code GetEnvInt and confused. but in old code in was also converted into float64.
does it make sense that queue length is float and not int? shouldn't we treat queue length as int?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right. It may be stored as a float64 just to avoid constantly casting the value in calculations.

I made the field an int.

Comment on lines 26 to 27
RedisAddres string `json:"redisAddress"`
HfToken string `json:"hfToken"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you think this is problematic?
putting personal access token or connection string that includes password, in a configuration file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revered the code here. These two fields are again fetched from environment variables

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.
I recommend also adding a comment about sensitive information in GIE Config API documentation. e.g., something like:

it's not recommended to store sensitive information in the plugin configuration and this kind of information should be stored differently (e.g., using env vars).

*not related to this PR.


// RegisterAllPlugins registers the factory functions of all plugins in this repository.
func RegisterAllPlugins() {
plugins.Register(filter.ByLabelsFilterType, filter.ByLabelFactory)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we register an example filter only for documentation purpose?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I registered all of the plugins

Comment on lines 41 to 44
rawPlugin := handle.Plugins().Plugin(parameters.PrefixScorerRef)
if rawPlugin == nil {
return nil, fmt.Errorf("there was no plugin with the name '%s' defined", parameters.PrefixScorerRef)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is why I suggested returning in GIE (rawPlugin, bool). this check shouldn't be in the responsibility of a config api consumer, this error should return from GIE.
not a big deal and certainly not a blocker, but I think this should be changed in GIE. otherwise, every plugin that may have ref to another plugin would need to write the error (and potentially different developers might write the error differently, making it harder to debug due to inconsistencies in the log).

Comment on lines 41 to 46
rawPlugin := handle.Plugins().Plugin(parameters.PrefixScorerRef)
if rawPlugin == nil {
return nil, fmt.Errorf("there was no plugin with the name '%s' defined", parameters.PrefixScorerRef)
}
prefixScorer, ok := rawPlugin.(*scorer.PrefixAwareScorer)
if !ok {
return nil, fmt.Errorf("the plugin with the name '%s' was not an instance of the PrefixAwareScorer", parameters.PrefixScorerRef)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all this logic can be implemented as generic function inside GIE, reducing the necessity to do the same code every time a plugin is referencing a different plugin.

see an example here:
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/scheduling/types/cycle_state.go#L97-L111

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated code to use GIE helper

@shmuelk
Copy link
Collaborator Author

shmuelk commented Jun 29, 2025

verifying - was this tested e2e?

Not yet. This PR only adds the factory functions and registers them.

The idea was to break things up into smaller chunks to make it easier to review.

@shmuelk shmuelk force-pushed the register-plugins branch from 0caf38d to 640b3f1 Compare June 30, 2025 15:00
shmuelk added 4 commits July 1, 2025 12:52
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
@shmuelk shmuelk force-pushed the register-plugins branch from af178e3 to eb8552e Compare July 1, 2025 09:58
@shmuelk shmuelk requested a review from kfirtoledo July 1, 2025 09:59
Comment on lines -25 to +39
return nil, errors.New("ByLabels: missing filter name")
return nil, errors.New("ByLabelSelector: missing filter name")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it mandatory to specify name?
it's optional in all other plugins..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +38 to +43
parameters := pdProfileHandlerParameters{
Config: prefix.Config{
HashBlockSize: prefix.DefaultHashBlockSize,
MaxPrefixBlocksToMatch: prefix.DefaultMaxPrefixBlocks,
LRUCapacityPerServer: prefix.DefaultLRUCapacityPerServer,
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense that prefix config appear on profile handler plugin?
shouldn't it appear on prefix plugin only?
something with this configuration looks not natural..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration is from the previous change to eliminate our own Prefix Scorer

}
return NewPdProfileHandler(cfg).WithName(name), nil
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on line 114 (I can't put the comment there cause it wasn't change in this PR):

prefixState, err := types.ReadCycleStateKey[*prefix.SchedulingContextState](cycleState, prefix.PrefixCachePluginType)
	if err != nil {
		log.FromContext(ctx).Error(err, "unable to read prefix state")
		return map[string]*framework.SchedulerProfile{}
	}

if this the expected behavior? if the prefix scorer failed to write the prefix we don't do PD?
cc @kfirtoledo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is from the previous change to eliminate our own Prefix Scorer

Comment on lines +41 to +42
MaxPrefixBlocksToMatch: prefix.DefaultMaxPrefixBlocks,
LRUCapacityPerServer: prefix.DefaultLRUCapacityPerServer,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is profile handler using these values?

Copy link
Collaborator

@nirrozenbaum nirrozenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

PR looks great, thanks!

I left two non-blocking minor comments/questions, one for @mayabar and one for @kfirtoledo that could be addressed (if needed?) in follow ups.

@shmuelk shmuelk merged commit 9063d2e into llm-d:main Jul 1, 2025
2 checks passed
@shmuelk shmuelk deleted the register-plugins branch July 1, 2025 11:54
pierDipi pushed a commit to pierDipi/llm-d-inference-scheduler that referenced this pull request Nov 28, 2025
Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Co-authored-by: konflux-internal-p02[bot] <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants