Skip to content

refactor(flowcontrol): Migrate Fairness Policies to EPP Plugin System#2031

Merged
k8s-ci-robot merged 6 commits intokubernetes-sigs:mainfrom
LukeAVanDrie:feature/fc-stateless-policy-refactor
Jan 20, 2026
Merged

refactor(flowcontrol): Migrate Fairness Policies to EPP Plugin System#2031
k8s-ci-robot merged 6 commits intokubernetes-sigs:mainfrom
LukeAVanDrie:feature/fc-stateless-policy-refactor

Conversation

@LukeAVanDrie
Copy link
Copy Markdown
Contributor

What type of PR is this?
/kind feature
/kind cleanup

What this PR does / why we need it:
This PR refactors the experimental Flow Control layer to use the standard EPP Plugin Registry for Fairness Policies (formerly, Inter-Flow Dispatch Policies), replacing the previous hardcoded factory implementation.

Architectural Change:
Previously, policies were instantiated per-band using a custom factory. Now, policies are registered as Singletons in the global EPP registry to align with the wider project architecture.

To support stateful logic (like Round Robin cursors) for each Priority Band while using singleton plugins, we utilize the flyweight pattern:

  1. Plugin: The Policy instance is a global stateless singleton (holding only immutable config).
  2. State: Mutable state (e.g., the cursor) is created via NewState(ctx) and stored on the PriorityBand.
  3. Execution: The Pick(ctx, band) method receives the specific state for that band at runtime.

Renaming & Standardization:

  • InterFlowDispatchPolicyFairnessPolicy: Aligns terminology with the "Fairness" tier in the dispatch hierarchy.
  • BestHeadGlobalStrict: Renamed to better describe the behavior (strict global ordering ignoring flow boundaries) rather than the implementation.
  • Documentation: Added pkg/epp/flowcontrol/framework/plugins/interflow/doc.go to document the 3-Tier Dispatch Hierarchy (Priority → Fairness → Ordering).

Reviewer Guide:
This PR includes a significant refactor (~1.5k lines changed), but the logic changes are localized. I recommend reviewing by commit:

  1. Commit 1: Defines the new FairnessPolicy and Accessor interfaces.
  2. Commit 2: Ports RoundRobin and GlobalStrict to the new interface.
  3. Commit 3: Updates config/loader to validate/hydrate policies at startup.
  4. Commit 4: Updates the Registry and Controller to use the new system.
  5. Commit 5: Cleanups (deletes legacy factory and docs).

Relationship to Previous Proposals:
This implementation supersedes the "Transient Plugin Lifecycle" proposal (#1977 ).
Based on feedback from @shmuelk and @kfswain I have abandoned the "Transient/Factory" framework changes in favor of the simpler flyweight pattern.

Which issue(s) this PR fixes:
Part of #1715

Does this PR introduce a user-facing change?:

[Experimental] Flow Control: Fairness Policies (formerly, Inter-Flow Dispatch Policies) are now integrated with the standard EPP Plugin system; however, they are not yet configurable.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 22, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 22, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@netlify
Copy link
Copy Markdown

netlify Bot commented Dec 22, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 6eaba6b
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/696fd5710f98380008b4b553
😎 Deploy Preview https://deploy-preview-2031--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 22, 2025
@LukeAVanDrie
Copy link
Copy Markdown
Contributor Author

Sorry for the large diff! I thought a lot about how to split this, but I ultimately opted for a single atomic PR for two architectural reasons:

  1. Atomic Consistency: This is a binary state change. Moving from the legacy Factory pattern to the new Flyweight/Singleton pattern requires changing the interfaces, plugins, and registry wiring simultaneously. Splitting it would have required "bridge code" or dead interfaces, adding noise.
  2. Contextual Completeness: The FairnessPolicy interface (Commit 1) is best understood alongside its implementation in RoundRobin (Commit 2) and its wiring in the Registry (Commit 3).

Review Strategy:
To mitigate the size, I structured the git history for a linear, commit-by-commit review

  • Commits 2 & 4 contain the actual logic changes.
  • Commits 3 & 5 are largely mechanical (config wiring and deleting legacy code).

I have also left inline reviewer comments to answer anticipated questions. Thanks!

Comment thread pkg/epp/flowcontrol/framework/plugins/interflow/roundrobin.go
Comment thread pkg/epp/flowcontrol/framework/plugins/interflow/doc.go
Comment thread pkg/epp/flowcontrol/framework/plugins/interflow/functional_test.go
Comment thread pkg/epp/flowcontrol/registry/config.go
Comment thread pkg/epp/flowcontrol/registry/shard.go
Comment thread pkg/epp/flowcontrol/registry/shard.go
Comment thread pkg/epp/flowcontrol/framework/accessors.go
Comment thread pkg/epp/flowcontrol/framework/plugins/interflow/factory.go
Comment thread pkg/epp/flowcontrol/framework/mocks/mocks.go
FlowKeysFunc func() []types.FlowKey
QueueFunc func(flowID string) framework.FlowQueueAccessor
IterateQueuesFunc func(callback func(queue framework.FlowQueueAccessor) (keepIterating bool))
IterateQueuesFunc func(callback func(flow framework.FlowQueueAccessor) (keepIterating bool))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function iterates on the queues of a flow, why not say that in the name?

I think the name should be FlowQueuesIterator, again without the func suffix.

Copy link
Copy Markdown
Contributor Author

@LukeAVanDrie LukeAVanDrie Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the point, but IterateQueuesFunc is deliberately named to map 1:1 with the
IterateQueues method in the PriorityBandAccessor interface it mocks. Changing the mock field name without changing the interface method name would reduce discoverability and consistency.

Renaming the interface method itself is something we can consider, but I'd prefer to handle that in a separate cleanup PR to keep this refactor focused.

Comment thread pkg/epp/flowcontrol/framework/mocks/mocks.go
Comment thread pkg/epp/flowcontrol/framework/plugins/interflow/besthead.go
@shmuelk
Copy link
Copy Markdown
Contributor

shmuelk commented Dec 22, 2025

Thank you for changing direction here and going with a style that is in-line with the other EPP plugins

I have only reviewed a small part of this PR.

I think it should have been broken up into at least two PRs. One that did the renames and second one that changed the flowcontrol extensions into proper EPP plugins.

@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Dec 22, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 22, 2025
@LukeAVanDrie
Copy link
Copy Markdown
Contributor Author

I think it should have been broken up into at least two PRs. One that did the renames and second one that changed the flowcontrol extensions into proper EPP plugins.

Thanks! This is a fair point regarding BestHead -> GlobalStrict (which could have probably been deferred). The reason I didn't separate the InterFlow -> Fairness rename is that the interface signature also changed. The rename would have introduced some temporary code churn, and I felt it was cleaner to introduce the new interface correctly from the start in the context of the plugin migration. Given I had to update several references in go-docs, call sites, etc. it would have maybe been worthwhile despite this. I will keep this in mind when migrating IntraFlowDispatchPolicy (to be renamed OrderingPolicy) next.

PriorityName: template.PriorityName,
IntraFlowDispatchPolicy: template.IntraFlowDispatchPolicy,
InterFlowDispatchPolicy: template.InterFlowDispatchPolicy,
FairnessPolicy: template.FairnessPolicy,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the FairnessPolicy (the old InterFlowDispatchPolicy) per priority in a shard? Shouldn't it be at the shard level?

This is unless I misunderstand that the FairnessPolicy is between priorities within a shard? While the IntraFlowDispatchPolicy is the ordering of things in a Priority.

@shmuelk
Copy link
Copy Markdown
Contributor

shmuelk commented Dec 25, 2025

While you're renaming things here, I really don't understand the use of the word band throughout the flow control layer.

Bands usually are ranges of things (to quote the AI part of a Google search "or a range of values (like a salary band)"). Here a band is for a single priority.

In the following code snippet it is clear that a PriorityBand is for a single priority.

type PriorityBandConfig struct {
	// Priority is the unique numerical priority level for this band.
	// Convention: Highest numeric value corresponds to highest priority (centered on 0).
	// Required.
	Priority int
         ...

@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Jan 5, 2026

General comment: Please focus only on aligning on the user facing names for now, names of internal structs/types can be changed separately as a followup.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 7, 2026
Comment thread pkg/epp/flowcontrol/registry/config.go
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Jan 7, 2026

I am ok with this, but for the future, this PR is massive, it could have been split in to two: the renaming and the architectural change for plugin creation

Define the new `framework.FairnessPolicy` interface to replace the
legacy `InterFlowDispatchPolicy`.

This introduces a separation of concerns (Stateless Logic + Scoped
State) to enable integration with the singleton EPP Plugin Registry.

Includes:
- `FairnessPolicy` interface with `NewState` and `Pick` methods.
- `PriorityBandAccessor` interface updates for state retrieval.
- Updated mocks in `framework/mocks`.
Port existing policies to the new `FairnessPolicy` interface and
register them with the EPP Plugin Registry.

- `RoundRobin`: Implements stateful cursor logic using `NewState`.
- `GlobalStrict` (formerly BestHead): Implements stateless greedy logic.
- Adds `interflow/doc.go` to document the 3-tier hierarchy.
Update config loader to validate and hydrate Fairness Policies using
EPP Plugin Registry handles.

- Updates `PriorityBandConfig` to hold `FairnessPolicy` instances.
- Moves policy resolution from runtime (shard creation) to config load
  time to fail fast on missing plugins.
- Hydrates default policies via the plugin handle.

- Update `PriorityBandConfig` to hold a `FairnessPolicy` instance.
- Move policy resolution from runtime (shard creation) to config load
  time.
- Ensure default policies (`GlobalStrict`) are strictly validated.
Update `RegistryShard` and `FlowController` to use the new plugin-based
`FairnessPolicy` instead of the legacy factory.

- Initializes policy state via `NewState` during band creation.
- Updates dispatch loop to pass context and state to `Pick()`.
- Preserves granular, per-band locking semantics.
Remove the obsolete `interflow/factory.go` and `RegisteredPolicies` map
now that all logic uses the standard EPP `plugins` package.

- Deletes legacy `InterFlowDispatchPolicy` interface from `policies.go`.
- Deletes outdated `interflow/README.md` in favor of `doc.go`.
- Updates comments to reflect new terminology.
@LukeAVanDrie LukeAVanDrie force-pushed the feature/fc-stateless-policy-refactor branch from 2783569 to ce79bae Compare January 20, 2026 18:31
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 20, 2026
@LukeAVanDrie
Copy link
Copy Markdown
Contributor Author

LukeAVanDrie commented Jan 20, 2026

While you're renaming things here, I really don't understand the use of the word band throughout the flow control layer.

Bands usually are ranges of things (to quote the AI part of a Google search "or a range of values (like a salary band)"). Here a band is for a single priority.

Thanks for the feedback. In our domain model, a "Priority Band" represents the logical grouping of all state and resources associated with a specific priority level. While "band" often implies a range of values, here it refers to a "horizontal slice" of traffic entitlement.

To clarify this distinction and scope, I've updated the PriorityBandConfig struct documentation in config.go to explicitly define it: "A 'Band' is defined as the collection (or range) of all flows having the same priority level."

This terminology is used similarly in Linux Traffic Control if you want prior art here.

Clarify architectural concepts and naming conventions based on review
feedback.
- Clarify "Priority Band" definition as a "range of flows".
- Explicitly define FairnessPolicy scope within a band.
- Document Mock naming conventions (V suffix for values) to explain
  divergence from interface method names.
- Annotate WithFairnessPolicy for future config loader usage (Issue
  kubernetes-sigs#1794).
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Jan 20, 2026

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 20, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2026
@k8s-ci-robot k8s-ci-robot merged commit 0c0ee27 into kubernetes-sigs:main Jan 20, 2026
11 checks passed
RyanRosario pushed a commit to RyanRosario/gateway-api-inference-extension that referenced this pull request Jan 20, 2026
…kubernetes-sigs#2031)

* refactor: define FairnessPolicy interface

Define the new `framework.FairnessPolicy` interface to replace the
legacy `InterFlowDispatchPolicy`.

This introduces a separation of concerns (Stateless Logic + Scoped
State) to enable integration with the singleton EPP Plugin Registry.

Includes:
- `FairnessPolicy` interface with `NewState` and `Pick` methods.
- `PriorityBandAccessor` interface updates for state retrieval.
- Updated mocks in `framework/mocks`.

* feat(flowcontrol): port policies to plugin system

Port existing policies to the new `FairnessPolicy` interface and
register them with the EPP Plugin Registry.

- `RoundRobin`: Implements stateful cursor logic using `NewState`.
- `GlobalStrict` (formerly BestHead): Implements stateless greedy logic.
- Adds `interflow/doc.go` to document the 3-tier hierarchy.

* feat: integrate policies with EPP config

Update config loader to validate and hydrate Fairness Policies using
EPP Plugin Registry handles.

- Updates `PriorityBandConfig` to hold `FairnessPolicy` instances.
- Moves policy resolution from runtime (shard creation) to config load
  time to fail fast on missing plugins.
- Hydrates default policies via the plugin handle.

- Update `PriorityBandConfig` to hold a `FairnessPolicy` instance.
- Move policy resolution from runtime (shard creation) to config load
  time.
- Ensure default policies (`GlobalStrict`) are strictly validated.

* refactor(flowcontrol): use FairnessPolicy plugins

Update `RegistryShard` and `FlowController` to use the new plugin-based
`FairnessPolicy` instead of the legacy factory.

- Initializes policy state via `NewState` during band creation.
- Updates dispatch loop to pass context and state to `Pick()`.
- Preserves granular, per-band locking semantics.

* cleanup(flowcontrol): remove legacy policy factory

Remove the obsolete `interflow/factory.go` and `RegisteredPolicies` map
now that all logic uses the standard EPP `plugins` package.

- Deletes legacy `InterFlowDispatchPolicy` interface from `policies.go`.
- Deletes outdated `interflow/README.md` in favor of `doc.go`.
- Updates comments to reflect new terminology.

* refactor(flowcontrol): address reviewer feedback

Clarify architectural concepts and naming conventions based on review
feedback.
- Clarify "Priority Band" definition as a "range of flows".
- Explicitly define FairnessPolicy scope within a band.
- Document Mock naming conventions (V suffix for values) to explain
  divergence from interface method names.
- Annotate WithFairnessPolicy for future config loader usage (Issue
  kubernetes-sigs#1794).
elevran pushed a commit to llm-d/llm-d-inference-scheduler that referenced this pull request Apr 23, 2026
…kubernetes-sigs/gateway-api-inference-extension#2031)

* refactor: define FairnessPolicy interface

Define the new `framework.FairnessPolicy` interface to replace the
legacy `InterFlowDispatchPolicy`.

This introduces a separation of concerns (Stateless Logic + Scoped
State) to enable integration with the singleton EPP Plugin Registry.

Includes:
- `FairnessPolicy` interface with `NewState` and `Pick` methods.
- `PriorityBandAccessor` interface updates for state retrieval.
- Updated mocks in `framework/mocks`.

* feat(flowcontrol): port policies to plugin system

Port existing policies to the new `FairnessPolicy` interface and
register them with the EPP Plugin Registry.

- `RoundRobin`: Implements stateful cursor logic using `NewState`.
- `GlobalStrict` (formerly BestHead): Implements stateless greedy logic.
- Adds `interflow/doc.go` to document the 3-tier hierarchy.

* feat: integrate policies with EPP config

Update config loader to validate and hydrate Fairness Policies using
EPP Plugin Registry handles.

- Updates `PriorityBandConfig` to hold `FairnessPolicy` instances.
- Moves policy resolution from runtime (shard creation) to config load
  time to fail fast on missing plugins.
- Hydrates default policies via the plugin handle.

- Update `PriorityBandConfig` to hold a `FairnessPolicy` instance.
- Move policy resolution from runtime (shard creation) to config load
  time.
- Ensure default policies (`GlobalStrict`) are strictly validated.

* refactor(flowcontrol): use FairnessPolicy plugins

Update `RegistryShard` and `FlowController` to use the new plugin-based
`FairnessPolicy` instead of the legacy factory.

- Initializes policy state via `NewState` during band creation.
- Updates dispatch loop to pass context and state to `Pick()`.
- Preserves granular, per-band locking semantics.

* cleanup(flowcontrol): remove legacy policy factory

Remove the obsolete `interflow/factory.go` and `RegisteredPolicies` map
now that all logic uses the standard EPP `plugins` package.

- Deletes legacy `InterFlowDispatchPolicy` interface from `policies.go`.
- Deletes outdated `interflow/README.md` in favor of `doc.go`.
- Updates comments to reflect new terminology.

* refactor(flowcontrol): address reviewer feedback

Clarify architectural concepts and naming conventions based on review
feedback.
- Clarify "Priority Band" definition as a "range of flows".
- Explicitly define FairnessPolicy scope within a band.
- Document Mock naming conventions (V suffix for values) to explain
  divergence from interface method names.
- Annotate WithFairnessPolicy for future config loader usage (Issue
  kubernetes-sigs/gateway-api-inference-extension#1794).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants