Skip to content

Refactor: Prepare EPP SaturationDetection as an Extension Point#1976

Merged
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
LukeAVanDrie:refactor/saturation-control-dir-structure
Dec 19, 2025
Merged

Refactor: Prepare EPP SaturationDetection as an Extension Point#1976
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
LukeAVanDrie:refactor/saturation-control-dir-structure

Conversation

@LukeAVanDrie
Copy link
Copy Markdown
Contributor

@LukeAVanDrie LukeAVanDrie commented Dec 8, 2025

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
This PR refactors the directory structure for the saturation control logic within the Endpoint Picker (EPP). It moves the current implementation to framework/plugins/utilizationdetector.

This change is a preparatory step to establish Saturation Detector as a formal extension point within the EPP, allowing
for multiple implementations. The existing logic is now housed as the utilizationdetector plugin, making way
for future additions like concurrencydetector.

This PR contains only file moves and renames; no functional changes are introduced. Note: it is not aligned with the
EPP plugin system yet. This is a preparatory step only.

Which issue(s) this PR fixes:
Tracks #1405 ('cc @nirrozenbaum) and #1793

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Dec 8, 2025
@netlify
Copy link
Copy Markdown

netlify Bot commented Dec 8, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 6f17d94
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/694488b05fae89000859c4d4
😎 Deploy Preview https://deploy-preview-1976--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 8, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 8, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 8, 2025
@LukeAVanDrie LukeAVanDrie changed the title Refactor: Relocate SaturationDetector Refactor: Prepare EPP SaturationControl as an Extension Point Dec 8, 2025
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Dec 8, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 8, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the refactor/saturation-control-dir-structure branch from a6a78b8 to 06c821f Compare December 9, 2025 01:07
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 9, 2025
@nirrozenbaum
Copy link
Copy Markdown
Contributor

/approve

FWIW, I think we should draw on board (or paper :) ) the updated design with the updated extension points and make sure we're aligned on how we think EPP should end up.
This is not a comment about this PR but more of a general comment, mainly because there are multiple different threads that push changes in parallel into our pluggability story (cc: @kfswain @ahg-g).

will stamp with lgtm once the CI tests pass (should fix boilerplate headers).

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 9, 2025
@kfswain
Copy link
Copy Markdown
Collaborator

kfswain commented Dec 9, 2025

/approve

FWIW, I think we should draw on board (or paper :) ) the updated design with the updated extension points and make sure we're aligned on how we think EPP should end up. This is not a comment about this PR but more of a general comment, mainly because there are multiple different threads that push changes in parallel into our pluggability story (cc: @kfswain @ahg-g).

will stamp with lgtm once the CI tests pass (should fix boilerplate headers).

I agree with this. There is another PR that is proposing to fundamentally change how plugins can operate, and I'm worried about the precedent that sets.

@LukeAVanDrie LukeAVanDrie force-pushed the refactor/saturation-control-dir-structure branch from 06c821f to 539d5c6 Compare December 9, 2025 21:51
@nirrozenbaum
Copy link
Copy Markdown
Contributor

I agree with this. There is another PR that is proposing to fundamentally change how plugins can operate, and I'm worried about the precedent that sets.

@kfswain not sure which other PR is the one you mentioned. from my point of view this PR can be stamped (it's just putting the existing saturation detector under a different directory).

leaving to Kellen the final stamp just to be on the safe side.

@LukeAVanDrie
Copy link
Copy Markdown
Contributor Author

LukeAVanDrie commented Dec 10, 2025

@kfswain not sure which other PR is the one you mentioned. from my point of view this PR can be stamped (it's just putting the existing saturation detector under a different directory).

He's referring to my other PR: #1977. I am working on the configuration and extensibility story for Flow Control. Flow Control has two potential extension points: interflow (fairness) and intraflow (ordering) policies. These, however, are scoped to priority bands or flows respectively and not well-supported by the singleton plugin model in their current state. I have a draft to support transient lifecycle for stateful, scoped plugin instances.

Shmuel recommended a stateless singleton approach relying on state-passing which I am now exploring instead as it is more closely aligned with our existing plugin model and less easy to abuse.


FWIW, I think we should draw on board (or paper :) ) the updated design with the updated extension points and make sure we're aligned on how we think EPP should end up.

This will be a good exercise. I am currently working toward adding (or rather promoting) the following extension points for our Flow Control story:

  • InterFlowDispatchPolicy (exposing SelectQueue): the fairness policy for picking which flow gets the next dispatch opportunity
  • IntraFlowDispatchPolicy (exposing Compare): the temporal scheduling policy that is responsible for defining the comparator used to sort the requests within a flow's queue
  • SaturationController (exposing ShouldDispatch) $\to$ formerly SaturationDetector

These then compose nicely into a flow control / scheduling regime.

  • Serve requests with Virtual Token Count (VTC) fairness (InterFlowDispatchPolicy) between flows ordered by Earliest Deadline First (IntraFlowDispatchPolicy) subject to a hard concurrency limit per pod (SaturationController) for pool saturation.
  • Serve requests with Round Robin fairness between flows ordered by FCFS subject to queue depth and kv-cache utilization thresholds for pool saturation.

And so on... These can all be mixed and matched.


I figured it was generally agreed upon to start this for saturation detection given #1405. I plan on doing exactly what's in the bug. Promoting our simple heuristic-based detector to the default implementation before adding a new in-tree concurrency-limit based controller as a followup.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 10, 2025
@nirrozenbaum
Copy link
Copy Markdown
Contributor

nirrozenbaum commented Dec 10, 2025

as a side note I must admit that I’ve read the explanation about what is InterFlowPolicy and what is IntraFlowPolicy at least 20 times in different threads, and I still find these names very confusing.

do you think we can rename it to a descriptive name- just by what it does?
e.g., FairnessDispatchPolicy, FlowOrderingPolicy, etc

@LukeAVanDrie
Copy link
Copy Markdown
Contributor Author

LukeAVanDrie commented Dec 10, 2025

as a side note I must admit that I’ve read the explanation about what is InterFlowPolicy and what is IntraFlowPolicy at least 20 times in different threads, and I still find these names very confusing.

do you think we can rename it to a descriptive name- just by what it does? e.g., FairnessDispatchPolicy, FlowOrderingPolicy, etc

Yes, these can be changed. They are not sticky at all yet. I think FairnessPolicy or FairnessStrategy is clear and to the point and will be the first type I migrate to the plugin model. When I open that PR (moving InterFlowDispatchPolicy definition to a plugins.go file and adding embedding plugins.Plugin for TypedName), we can discuss names there.

These will eventually be reflected in our configuration guide docs and such, so we should pick the best user-facing names.


Edit: mocking the config example makes choices more clear. Thoughts?

fairnessPolicy:
  pluginRef: virtual-token-count
orderingPolicy:
  pluginRef: earliest-deadline-first

@LukeAVanDrie LukeAVanDrie force-pushed the refactor/saturation-control-dir-structure branch from 539d5c6 to 2e3fb2f Compare December 16, 2025 01:15
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 16, 2025
@LukeAVanDrie
Copy link
Copy Markdown
Contributor Author

Rebased and resolved merge conflict

@LukeAVanDrie
Copy link
Copy Markdown
Contributor Author

/assign @nirrozenbaum

PR should be in a ready state to merge now. Thanks for the feedback!

Comment thread test/integration/epp/hermetic_test.go Outdated
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics"
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/requestcontrol"
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/saturationdetector"
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/saturationcontrol/framework/plugins/staticthresholdcontroller"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the use of the word controller in the name here. The EPP is a K8S controller for several CRDs (and growing). I think that is the only context for use of the word controller in a name.

Is the plan for this new extension point to control saturation or just detect saturation and let other components decide what to do?

Also as the word static is used in the name here, is there an intention to a dynamic threshold plugin?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ agree with @shmuelk's comments.
why do we rename from detector to controller? that is confusing when reading this code in the context of k8s.

Copy link
Copy Markdown
Contributor Author

@LukeAVanDrie LukeAVanDrie Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmuelk Thanks for the feedback. I agree that "Controller" is overloaded in the K8s ecosystem and we should avoid confusion with CRD reconcilers. However, I want to move away from "Detector" because the scope of this component is expanding. In the legacy implementation, it was a passive observer returning a boolean. I have some new implementations that act as a closed-loop control systems. They manage internal state and actively calculate dispatch rates or concurrency limits based on feedback variables.

Roadmap:

  • Static Thresholds: (This PR) Throttles based on statically configured metric thresholds.
  • Static Concurrency: Throttles based on statically configured concurrency limits per-pod.
  • Adaptive/Dynamic Concurrency (exploratory/stretch goal): A feedback loop that auto-tune limits.

So, we have a component that actively manages request flow (via its process variable, "saturation") and is somewhat steeped in control theory. This is how I landed on "Controller."

To avoid overloading this term, I suggest renaming this extension point to "SaturationRegulator". This preserves the semantics distinguishing it from passive observation. The path would look like: pkg/epp/saturationregulator/framework/plugins/staticthresholdregulator/regulator.go.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirrozenbaum and @shmuelk

Thanks again for the feedback! I’ve updated the PR to stick with the 'Detector' terminology to align with existing conventions and your suggestions. To better reflect the specific logic of this implementation and the pluggable architecture, I've made the following changes:

  1. Renamed to utilizationdetector: This maintains the 'Detector' suffix you requested while being more descriptive and less academic than the previous 'staticthreshold' proposal. It specifically detects saturation based on resource utilization (Queue/KV Cache).
  2. Path Changes: I've moved the logic into: pkg/epp/saturationdetector/framework/plugins/utilizationdetector/detector.go.

My planned concurrency plugins then become: concurrencydetector and adaptiveconcurrencydetector (or similar).

I have force-pushed these changes over the previous commit. I wanted to avoid a messy history of chained git mv commands and ensure the file history for the original detector logic remains clean and easy to follow in the new path.

Hopefully this resolves your concerns here. PTAL!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me! Thanks Luke!

This commit moves the existing `SaturationDetector` implementation to
its new home under the EPP plugin framework structure. This is a
preparatory step towards making `SaturationDetector` an official EPP
extension point.

- Moved files to `.../framework/plugins/utilizationdetector`
- Renamed `saturationdetector.go` to `detector.go`
- Renamed `saturationdetector_test.go` to `detector_test.go`
- Fixed imports

No functional changes are included in this commit.
@LukeAVanDrie LukeAVanDrie force-pushed the refactor/saturation-control-dir-structure branch from 2e3fb2f to 6f17d94 Compare December 18, 2025 23:05
@LukeAVanDrie LukeAVanDrie changed the title Refactor: Prepare EPP SaturationControl as an Extension Point Refactor: Prepare EPP SaturationDetection as an Extension Point Dec 18, 2025
@kfswain
Copy link
Copy Markdown
Collaborator

kfswain commented Dec 19, 2025

/approve

Will leave final stamp to others! Thanks! Stoked about this

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, LukeAVanDrie, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [kfswain,nirrozenbaum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nirrozenbaum
Copy link
Copy Markdown
Contributor

/lgtm

Thanks Luke 🙏🏼

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 19, 2025
@k8s-ci-robot k8s-ci-robot merged commit 53579ac into kubernetes-sigs:main Dec 19, 2025
18 checks passed
LukeAVanDrie added a commit to LukeAVanDrie/gateway-api-inference-extension that referenced this pull request Dec 21, 2025
LukeAVanDrie added a commit to LukeAVanDrie/gateway-api-inference-extension that referenced this pull request Dec 22, 2025
LukeAVanDrie added a commit to LukeAVanDrie/gateway-api-inference-extension that referenced this pull request Jan 5, 2026
k8s-ci-robot pushed a commit that referenced this pull request Jan 5, 2026
* test: promote lifecycle logic to shared utils

Refactor the integration test utilities to provide a standardized way to
start hermetic gRPC servers, and migrate the BBR harness to use it.

- Add `StartExtProcServer`: A helper for background server management
  with strict TCP readiness checks and 127.0.0.1 binding.
- Update `GetFreePort`: Return `int` instead of `*net.TCPAddr` for
  easier consumption.
- Refactor `BBRHarness`: Remove manual server startup boilerplate in
  favor of the new shared helper, reducing code duplication.

* test/epp: introduce isolated test harness and DSL

Add a new testing infrastructure for the Endpoint Picker (EPP) to enable
hermetic, namespace-isolated testing.

- Add `TestHarness`: Manages a namespace-scoped Controller Manager and
  server instance per test case. This ensures resources from one test
  do not pollute the scheduler state of another.
- Add DSL: Introduce intent-based builders (e.g., `ReqSubset`, `P`) and
  assertions (e.g., `ExpectRouteTo`) to decouple test logic from gRPC
  boilerplate.
- Support deterministic synchronization by waiting for Prometheus
  metrics to settle before generating traffic.

* test/epp: migrate integration suite to new harness

Rewrite the hermetic integration suite to use the new isolated harness
and DSL.

- Migrate all existing test cases from epp_test.go to hermetic_test.go.
- Enable pre-loading of base CRDs in TestMain to optimize setup time.
- Remove legacy setup code and global variable reliance.

* resolve conflicts from #1976

* Use serverCtx instead of ctx.

* Rebase and resolve conflicts
davidbreitgand pushed a commit to davidbreitgand/gateway-api-inference-extension that referenced this pull request Jan 6, 2026
This commit moves the existing `SaturationDetector` implementation to
its new home under the EPP plugin framework structure. This is a
preparatory step towards making `SaturationDetector` an official EPP
extension point.

- Moved files to `.../framework/plugins/utilizationdetector`
- Renamed `saturationdetector.go` to `detector.go`
- Renamed `saturationdetector_test.go` to `detector_test.go`
- Fixed imports

No functional changes are included in this commit.
davidbreitgand pushed a commit to davidbreitgand/gateway-api-inference-extension that referenced this pull request Jan 6, 2026
…ernetes-sigs#2022)

* test: promote lifecycle logic to shared utils

Refactor the integration test utilities to provide a standardized way to
start hermetic gRPC servers, and migrate the BBR harness to use it.

- Add `StartExtProcServer`: A helper for background server management
  with strict TCP readiness checks and 127.0.0.1 binding.
- Update `GetFreePort`: Return `int` instead of `*net.TCPAddr` for
  easier consumption.
- Refactor `BBRHarness`: Remove manual server startup boilerplate in
  favor of the new shared helper, reducing code duplication.

* test/epp: introduce isolated test harness and DSL

Add a new testing infrastructure for the Endpoint Picker (EPP) to enable
hermetic, namespace-isolated testing.

- Add `TestHarness`: Manages a namespace-scoped Controller Manager and
  server instance per test case. This ensures resources from one test
  do not pollute the scheduler state of another.
- Add DSL: Introduce intent-based builders (e.g., `ReqSubset`, `P`) and
  assertions (e.g., `ExpectRouteTo`) to decouple test logic from gRPC
  boilerplate.
- Support deterministic synchronization by waiting for Prometheus
  metrics to settle before generating traffic.

* test/epp: migrate integration suite to new harness

Rewrite the hermetic integration suite to use the new isolated harness
and DSL.

- Migrate all existing test cases from epp_test.go to hermetic_test.go.
- Enable pre-loading of base CRDs in TestMain to optimize setup time.
- Remove legacy setup code and global variable reliance.

* resolve conflicts from kubernetes-sigs#1976

* Use serverCtx instead of ctx.

* Rebase and resolve conflicts
elevran pushed a commit to llm-d/llm-d-inference-scheduler that referenced this pull request Apr 23, 2026
…ference-extension#1976)

This commit moves the existing `SaturationDetector` implementation to
its new home under the EPP plugin framework structure. This is a
preparatory step towards making `SaturationDetector` an official EPP
extension point.

- Moved files to `.../framework/plugins/utilizationdetector`
- Renamed `saturationdetector.go` to `detector.go`
- Renamed `saturationdetector_test.go` to `detector_test.go`
- Fixed imports

No functional changes are included in this commit.
nirrozenbaum pushed a commit to llm-d/llm-d-inference-payload-processor that referenced this pull request Apr 28, 2026
…ernetes-sigs/gateway-api-inference-extension#2022)

* test: promote lifecycle logic to shared utils

Refactor the integration test utilities to provide a standardized way to
start hermetic gRPC servers, and migrate the BBR harness to use it.

- Add `StartExtProcServer`: A helper for background server management
  with strict TCP readiness checks and 127.0.0.1 binding.
- Update `GetFreePort`: Return `int` instead of `*net.TCPAddr` for
  easier consumption.
- Refactor `BBRHarness`: Remove manual server startup boilerplate in
  favor of the new shared helper, reducing code duplication.

* test/epp: introduce isolated test harness and DSL

Add a new testing infrastructure for the Endpoint Picker (EPP) to enable
hermetic, namespace-isolated testing.

- Add `TestHarness`: Manages a namespace-scoped Controller Manager and
  server instance per test case. This ensures resources from one test
  do not pollute the scheduler state of another.
- Add DSL: Introduce intent-based builders (e.g., `ReqSubset`, `P`) and
  assertions (e.g., `ExpectRouteTo`) to decouple test logic from gRPC
  boilerplate.
- Support deterministic synchronization by waiting for Prometheus
  metrics to settle before generating traffic.

* test/epp: migrate integration suite to new harness

Rewrite the hermetic integration suite to use the new isolated harness
and DSL.

- Migrate all existing test cases from epp_test.go to hermetic_test.go.
- Enable pre-loading of base CRDs in TestMain to optimize setup time.
- Remove legacy setup code and global variable reliance.

* resolve conflicts from kubernetes-sigs/gateway-api-inference-extension#1976

* Use serverCtx instead of ctx.

* Rebase and resolve conflicts
elevran pushed a commit to llm-d/llm-d-inference-scheduler that referenced this pull request May 3, 2026
…ference-extension#1976)

This commit moves the existing `SaturationDetector` implementation to
its new home under the EPP plugin framework structure. This is a
preparatory step towards making `SaturationDetector` an official EPP
extension point.

- Moved files to `.../framework/plugins/utilizationdetector`
- Renamed `saturationdetector.go` to `detector.go`
- Renamed `saturationdetector_test.go` to `detector_test.go`
- Fixed imports

No functional changes are included in this commit.
elevran pushed a commit to llm-d/llm-d-inference-scheduler that referenced this pull request May 3, 2026
…ernetes-sigs/gateway-api-inference-extension#2022)

* test: promote lifecycle logic to shared utils

Refactor the integration test utilities to provide a standardized way to
start hermetic gRPC servers, and migrate the BBR harness to use it.

- Add `StartExtProcServer`: A helper for background server management
  with strict TCP readiness checks and 127.0.0.1 binding.
- Update `GetFreePort`: Return `int` instead of `*net.TCPAddr` for
  easier consumption.
- Refactor `BBRHarness`: Remove manual server startup boilerplate in
  favor of the new shared helper, reducing code duplication.

* test/epp: introduce isolated test harness and DSL

Add a new testing infrastructure for the Endpoint Picker (EPP) to enable
hermetic, namespace-isolated testing.

- Add `TestHarness`: Manages a namespace-scoped Controller Manager and
  server instance per test case. This ensures resources from one test
  do not pollute the scheduler state of another.
- Add DSL: Introduce intent-based builders (e.g., `ReqSubset`, `P`) and
  assertions (e.g., `ExpectRouteTo`) to decouple test logic from gRPC
  boilerplate.
- Support deterministic synchronization by waiting for Prometheus
  metrics to settle before generating traffic.

* test/epp: migrate integration suite to new harness

Rewrite the hermetic integration suite to use the new isolated harness
and DSL.

- Migrate all existing test cases from epp_test.go to hermetic_test.go.
- Enable pre-loading of base CRDs in TestMain to optimize setup time.
- Remove legacy setup code and global variable reliance.

* resolve conflicts from kubernetes-sigs/gateway-api-inference-extension#1976

* Use serverCtx instead of ctx.

* Rebase and resolve conflicts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants