Refactor: Prepare EPP SaturationDetection as an Extension Point#1976
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @LukeAVanDrie. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
a6a78b8 to
06c821f
Compare
|
/approve FWIW, I think we should draw on board (or paper :) ) the updated design with the updated extension points and make sure we're aligned on how we think EPP should end up. will stamp with lgtm once the CI tests pass (should fix boilerplate headers). |
I agree with this. There is another PR that is proposing to fundamentally change how plugins can operate, and I'm worried about the precedent that sets. |
06c821f to
539d5c6
Compare
@kfswain not sure which other PR is the one you mentioned. from my point of view this PR can be stamped (it's just putting the existing saturation detector under a different directory). leaving to Kellen the final stamp just to be on the safe side. |
He's referring to my other PR: #1977. I am working on the configuration and extensibility story for Flow Control. Flow Control has two potential extension points: interflow (fairness) and intraflow (ordering) policies. These, however, are scoped to priority bands or flows respectively and not well-supported by the singleton plugin model in their current state. I have a draft to support transient lifecycle for stateful, scoped plugin instances. Shmuel recommended a stateless singleton approach relying on state-passing which I am now exploring instead as it is more closely aligned with our existing plugin model and less easy to abuse.
This will be a good exercise. I am currently working toward adding (or rather promoting) the following extension points for our Flow Control story:
These then compose nicely into a flow control / scheduling regime.
And so on... These can all be mixed and matched. I figured it was generally agreed upon to start this for saturation detection given #1405. I plan on doing exactly what's in the bug. Promoting our simple heuristic-based detector to the default implementation before adding a new in-tree concurrency-limit based controller as a followup. |
|
as a side note I must admit that I’ve read the explanation about what is InterFlowPolicy and what is IntraFlowPolicy at least 20 times in different threads, and I still find these names very confusing. do you think we can rename it to a descriptive name- just by what it does? |
Yes, these can be changed. They are not sticky at all yet. I think These will eventually be reflected in our configuration guide docs and such, so we should pick the best user-facing names. Edit: mocking the config example makes choices more clear. Thoughts? |
539d5c6 to
2e3fb2f
Compare
|
Rebased and resolved merge conflict |
|
/assign @nirrozenbaum PR should be in a ready state to merge now. Thanks for the feedback! |
| "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/metrics" | ||
| "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/requestcontrol" | ||
| "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/saturationdetector" | ||
| "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/saturationcontrol/framework/plugins/staticthresholdcontroller" |
There was a problem hiding this comment.
I don't like the use of the word controller in the name here. The EPP is a K8S controller for several CRDs (and growing). I think that is the only context for use of the word controller in a name.
Is the plan for this new extension point to control saturation or just detect saturation and let other components decide what to do?
Also as the word static is used in the name here, is there an intention to a dynamic threshold plugin?
There was a problem hiding this comment.
++ agree with @shmuelk's comments.
why do we rename from detector to controller? that is confusing when reading this code in the context of k8s.
There was a problem hiding this comment.
@shmuelk Thanks for the feedback. I agree that "Controller" is overloaded in the K8s ecosystem and we should avoid confusion with CRD reconcilers. However, I want to move away from "Detector" because the scope of this component is expanding. In the legacy implementation, it was a passive observer returning a boolean. I have some new implementations that act as a closed-loop control systems. They manage internal state and actively calculate dispatch rates or concurrency limits based on feedback variables.
Roadmap:
- Static Thresholds: (This PR) Throttles based on statically configured metric thresholds.
- Static Concurrency: Throttles based on statically configured concurrency limits per-pod.
- Adaptive/Dynamic Concurrency (exploratory/stretch goal): A feedback loop that auto-tune limits.
So, we have a component that actively manages request flow (via its process variable, "saturation") and is somewhat steeped in control theory. This is how I landed on "Controller."
To avoid overloading this term, I suggest renaming this extension point to "SaturationRegulator". This preserves the semantics distinguishing it from passive observation. The path would look like: pkg/epp/saturationregulator/framework/plugins/staticthresholdregulator/regulator.go.
There was a problem hiding this comment.
Thanks again for the feedback! I’ve updated the PR to stick with the 'Detector' terminology to align with existing conventions and your suggestions. To better reflect the specific logic of this implementation and the pluggable architecture, I've made the following changes:
- Renamed to
utilizationdetector: This maintains the 'Detector' suffix you requested while being more descriptive and less academic than the previous 'staticthreshold' proposal. It specifically detects saturation based on resource utilization (Queue/KV Cache). - Path Changes: I've moved the logic into:
pkg/epp/saturationdetector/framework/plugins/utilizationdetector/detector.go.
My planned concurrency plugins then become: concurrencydetector and adaptiveconcurrencydetector (or similar).
I have force-pushed these changes over the previous commit. I wanted to avoid a messy history of chained git mv commands and ensure the file history for the original detector logic remains clean and easy to follow in the new path.
Hopefully this resolves your concerns here. PTAL!
There was a problem hiding this comment.
That sounds good to me! Thanks Luke!
This commit moves the existing `SaturationDetector` implementation to its new home under the EPP plugin framework structure. This is a preparatory step towards making `SaturationDetector` an official EPP extension point. - Moved files to `.../framework/plugins/utilizationdetector` - Renamed `saturationdetector.go` to `detector.go` - Renamed `saturationdetector_test.go` to `detector_test.go` - Fixed imports No functional changes are included in this commit.
2e3fb2f to
6f17d94
Compare
|
/approve Will leave final stamp to others! Thanks! Stoked about this |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kfswain, LukeAVanDrie, nirrozenbaum The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm Thanks Luke 🙏🏼 |
* test: promote lifecycle logic to shared utils Refactor the integration test utilities to provide a standardized way to start hermetic gRPC servers, and migrate the BBR harness to use it. - Add `StartExtProcServer`: A helper for background server management with strict TCP readiness checks and 127.0.0.1 binding. - Update `GetFreePort`: Return `int` instead of `*net.TCPAddr` for easier consumption. - Refactor `BBRHarness`: Remove manual server startup boilerplate in favor of the new shared helper, reducing code duplication. * test/epp: introduce isolated test harness and DSL Add a new testing infrastructure for the Endpoint Picker (EPP) to enable hermetic, namespace-isolated testing. - Add `TestHarness`: Manages a namespace-scoped Controller Manager and server instance per test case. This ensures resources from one test do not pollute the scheduler state of another. - Add DSL: Introduce intent-based builders (e.g., `ReqSubset`, `P`) and assertions (e.g., `ExpectRouteTo`) to decouple test logic from gRPC boilerplate. - Support deterministic synchronization by waiting for Prometheus metrics to settle before generating traffic. * test/epp: migrate integration suite to new harness Rewrite the hermetic integration suite to use the new isolated harness and DSL. - Migrate all existing test cases from epp_test.go to hermetic_test.go. - Enable pre-loading of base CRDs in TestMain to optimize setup time. - Remove legacy setup code and global variable reliance. * resolve conflicts from #1976 * Use serverCtx instead of ctx. * Rebase and resolve conflicts
This commit moves the existing `SaturationDetector` implementation to its new home under the EPP plugin framework structure. This is a preparatory step towards making `SaturationDetector` an official EPP extension point. - Moved files to `.../framework/plugins/utilizationdetector` - Renamed `saturationdetector.go` to `detector.go` - Renamed `saturationdetector_test.go` to `detector_test.go` - Fixed imports No functional changes are included in this commit.
…ernetes-sigs#2022) * test: promote lifecycle logic to shared utils Refactor the integration test utilities to provide a standardized way to start hermetic gRPC servers, and migrate the BBR harness to use it. - Add `StartExtProcServer`: A helper for background server management with strict TCP readiness checks and 127.0.0.1 binding. - Update `GetFreePort`: Return `int` instead of `*net.TCPAddr` for easier consumption. - Refactor `BBRHarness`: Remove manual server startup boilerplate in favor of the new shared helper, reducing code duplication. * test/epp: introduce isolated test harness and DSL Add a new testing infrastructure for the Endpoint Picker (EPP) to enable hermetic, namespace-isolated testing. - Add `TestHarness`: Manages a namespace-scoped Controller Manager and server instance per test case. This ensures resources from one test do not pollute the scheduler state of another. - Add DSL: Introduce intent-based builders (e.g., `ReqSubset`, `P`) and assertions (e.g., `ExpectRouteTo`) to decouple test logic from gRPC boilerplate. - Support deterministic synchronization by waiting for Prometheus metrics to settle before generating traffic. * test/epp: migrate integration suite to new harness Rewrite the hermetic integration suite to use the new isolated harness and DSL. - Migrate all existing test cases from epp_test.go to hermetic_test.go. - Enable pre-loading of base CRDs in TestMain to optimize setup time. - Remove legacy setup code and global variable reliance. * resolve conflicts from kubernetes-sigs#1976 * Use serverCtx instead of ctx. * Rebase and resolve conflicts
…ference-extension#1976) This commit moves the existing `SaturationDetector` implementation to its new home under the EPP plugin framework structure. This is a preparatory step towards making `SaturationDetector` an official EPP extension point. - Moved files to `.../framework/plugins/utilizationdetector` - Renamed `saturationdetector.go` to `detector.go` - Renamed `saturationdetector_test.go` to `detector_test.go` - Fixed imports No functional changes are included in this commit.
…ernetes-sigs/gateway-api-inference-extension#2022) * test: promote lifecycle logic to shared utils Refactor the integration test utilities to provide a standardized way to start hermetic gRPC servers, and migrate the BBR harness to use it. - Add `StartExtProcServer`: A helper for background server management with strict TCP readiness checks and 127.0.0.1 binding. - Update `GetFreePort`: Return `int` instead of `*net.TCPAddr` for easier consumption. - Refactor `BBRHarness`: Remove manual server startup boilerplate in favor of the new shared helper, reducing code duplication. * test/epp: introduce isolated test harness and DSL Add a new testing infrastructure for the Endpoint Picker (EPP) to enable hermetic, namespace-isolated testing. - Add `TestHarness`: Manages a namespace-scoped Controller Manager and server instance per test case. This ensures resources from one test do not pollute the scheduler state of another. - Add DSL: Introduce intent-based builders (e.g., `ReqSubset`, `P`) and assertions (e.g., `ExpectRouteTo`) to decouple test logic from gRPC boilerplate. - Support deterministic synchronization by waiting for Prometheus metrics to settle before generating traffic. * test/epp: migrate integration suite to new harness Rewrite the hermetic integration suite to use the new isolated harness and DSL. - Migrate all existing test cases from epp_test.go to hermetic_test.go. - Enable pre-loading of base CRDs in TestMain to optimize setup time. - Remove legacy setup code and global variable reliance. * resolve conflicts from kubernetes-sigs/gateway-api-inference-extension#1976 * Use serverCtx instead of ctx. * Rebase and resolve conflicts
…ference-extension#1976) This commit moves the existing `SaturationDetector` implementation to its new home under the EPP plugin framework structure. This is a preparatory step towards making `SaturationDetector` an official EPP extension point. - Moved files to `.../framework/plugins/utilizationdetector` - Renamed `saturationdetector.go` to `detector.go` - Renamed `saturationdetector_test.go` to `detector_test.go` - Fixed imports No functional changes are included in this commit.
…ernetes-sigs/gateway-api-inference-extension#2022) * test: promote lifecycle logic to shared utils Refactor the integration test utilities to provide a standardized way to start hermetic gRPC servers, and migrate the BBR harness to use it. - Add `StartExtProcServer`: A helper for background server management with strict TCP readiness checks and 127.0.0.1 binding. - Update `GetFreePort`: Return `int` instead of `*net.TCPAddr` for easier consumption. - Refactor `BBRHarness`: Remove manual server startup boilerplate in favor of the new shared helper, reducing code duplication. * test/epp: introduce isolated test harness and DSL Add a new testing infrastructure for the Endpoint Picker (EPP) to enable hermetic, namespace-isolated testing. - Add `TestHarness`: Manages a namespace-scoped Controller Manager and server instance per test case. This ensures resources from one test do not pollute the scheduler state of another. - Add DSL: Introduce intent-based builders (e.g., `ReqSubset`, `P`) and assertions (e.g., `ExpectRouteTo`) to decouple test logic from gRPC boilerplate. - Support deterministic synchronization by waiting for Prometheus metrics to settle before generating traffic. * test/epp: migrate integration suite to new harness Rewrite the hermetic integration suite to use the new isolated harness and DSL. - Migrate all existing test cases from epp_test.go to hermetic_test.go. - Enable pre-loading of base CRDs in TestMain to optimize setup time. - Remove legacy setup code and global variable reliance. * resolve conflicts from kubernetes-sigs/gateway-api-inference-extension#1976 * Use serverCtx instead of ctx. * Rebase and resolve conflicts
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This PR refactors the directory structure for the saturation control logic within the Endpoint Picker (EPP). It moves the current implementation to
framework/plugins/utilizationdetector.This change is a preparatory step to establish Saturation Detector as a formal extension point within the EPP, allowing
for multiple implementations. The existing logic is now housed as the
utilizationdetectorplugin, making wayfor future additions like
concurrencydetector.This PR contains only file moves and renames; no functional changes are introduced. Note: it is not aligned with the
EPP plugin system yet. This is a preparatory step only.
Which issue(s) this PR fixes:
Tracks #1405 ('cc @nirrozenbaum) and #1793
Does this PR introduce a user-facing change?: