Current saturation detector is checking the saturation of the system based on two main metrics for each pod -
the waiting queue size and the kv cache utilization.
more details here:
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/saturationdetector/saturationdetector.go
whoever is using IGW may want to define different criteria for saturation, not necessarily these metrics.
In order to allow flexibility of the saturation check - it should become an extension point, and current code may become an implementation of that extension point (we may ship it as default plugin).
This change will also clean the saturation config.go file which defines env vars for setting the thresholds of current saturation flags and those will become parameters of a plugin.
Current saturation detector is checking the saturation of the system based on two main metrics for each pod -
the waiting queue size and the kv cache utilization.
more details here:
https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/saturationdetector/saturationdetector.go
whoever is using IGW may want to define different criteria for saturation, not necessarily these metrics.
In order to allow flexibility of the saturation check - it should become an extension point, and current code may become an implementation of that extension point (we may ship it as default plugin).
This change will also clean the saturation config.go file which defines env vars for setting the thresholds of current saturation flags and those will become parameters of a plugin.