Skip to content

feat(Flow Control)/Expand Flow Control capacity limits schema(resource.Quantity)#2492

Merged
k8s-ci-robot merged 11 commits intokubernetes-sigs:mainfrom
BizerNotNull:feat/expand-flow-control-schema
Mar 14, 2026
Merged

feat(Flow Control)/Expand Flow Control capacity limits schema(resource.Quantity)#2492
k8s-ci-robot merged 11 commits intokubernetes-sigs:mainfrom
BizerNotNull:feat/expand-flow-control-schema

Conversation

@BizerNotNull
Copy link
Copy Markdown
Contributor

@BizerNotNull BizerNotNull commented Mar 5, 2026

What type of PR is this?

/kind feature
/kind test
/kind deprecation
What this PR does / why we need it:

  1. The PR is a breaking change to config API, it deprecates maxBytes in endpointpickerconfig_type, change to use limits.maxBytes by adding struct CapacityLimits.
  2. The limits.maxBytes use resource.Quantity to replace the int, means you can use 1Gi like k8s.
    example similar to:
    before:
flowControl:
  priorityBands:
    - priority: 100
    - maxBytes: 1000     

after:

flowControl:
  priorityBands:
    - priority: 100
    - maxBytes: 1Gi      
  1. config.go: add one helper resolveMaxBytes to replace the check of maxBytes, and the conversion of resource.Quantity to uint64 also occurs here.
  2. I run make generate regenerate the deepcopy
    Which issue(s) this PR fixes:

Fixes Part 1 of #2486

Does this PR introduce a user-facing change?:

The `maxBytes` use `resource.Quantity` to replace the `int`, means you can use `1Gi` like k8s.

- add struct CapacityLimits
- add Limits Config in PriorityBandConfig and FlowControlConfig
…ConfigFromAPI` function:

- Add New Logic: Implement logic to write the new configuration field `Limit.MaxBytes` to a `uint64` variable, including the necessary type conversion.
- Remove Old Logic: Delete the existing logic responsible for writing the legacy `MaxBytes` configuration.
- use the new conf for the logs print
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. kind/deprecation Categorizes issue or PR as related to a feature/enhancement marked for deprecation. labels Mar 5, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@BizerNotNull: The label(s) kind/test cannot be applied, because the repository doesn't have them.

Details

In response to this:

What type of PR is this?

/kind feature
/kind test
/kind deprecation
What this PR does / why we need it:

  1. The PR is a breaking change to config API, it deprecates maxBytes in endpointpickerconfig_type, change to use limits.maxBytes
  2. The limits.maxBytes use resource.Quantity to replace the int, means you can use 1Gi like k8s
    example similar to:
    before:
flowControl:
 priorityBands:
   - priority: 100
   - maxBytes: 1000     

after:

flowControl:
 priorityBands:
   - priority: 100
     limits:
       maxBytes: 1Gi      
  1. endpointpickerconfig_type.go: delete the maxBytes config to limit.maxBytes and adjust the log to adapt
  2. config.go: add one helper resolveMaxBytes to replace the check of maxBytes, and the conversion of resource.Quantity to uint64 also occurs here.
  3. testdata_test,config_test,rigistry/config_test: change the old maxBytes to the limitsd.maxBytes, and I add one unit to test the 1Gi if done.
    Which issue(s) this PR fixes:

This PR is the first for Fixes #2486

Does this PR introduce a user-facing change?:

We deprecates  `maxBytes` in `endpointpickerconfig_type`, change to use `limits.maxBytes`.The `limits.maxBytes` use `resource.Quantity` to replace the `int`, means you can use `1Gi` like k8s.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Mar 5, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 5, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 8442b98
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69b25be2b6f33a00087cdd91
😎 Deploy Preview https://deploy-preview-2492--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @BizerNotNull!

It looks like this is your first PR to kubernetes-sigs/gateway-api-inference-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api-inference-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 5, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @BizerNotNull. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 5, 2026
@BizerNotNull BizerNotNull marked this pull request as ready for review March 5, 2026 08:25
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 5, 2026
@k8s-ci-robot k8s-ci-robot requested review from kfswain and shmuelk March 5, 2026 08:25
Copy link
Copy Markdown
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test
/assign @LukeAVanDrie

Comment thread conformance/go.mod Outdated
github.com/fxamacker/cbor/v2 v2.9.0 // indirect
github.com/go-openapi/jsonpointer v0.22.1 // indirect
github.com/go-openapi/jsonreference v0.21.3 // indirect
github.com/go-openapi/jsonpointer v0.22.4 // indirect
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls revert the changes to the go.mod and go.sum

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted this. But this is because the go.mod in the root directory and the one in the conformance folder have mismatched versions, which causes updates every time make test is run

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 5, 2026
// Default: 0 (unlimited).
MaxBytes *int64 `json:"maxBytes,omitempty"`
// If not specified, no global limits are enforced.
Limits *CapacityLimits `json:"limits,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious what other types of limits you forsee here. since we are in a config object keeping the API flat also seems reasonable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just requests and bytes. The only other possibility I can think of is estimated tokens, but that doesn't seem practical at the moment. Definitely open to keeping the structure flat.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've flated the Limits by the new commits, I'll update the pr's description soon.

}

// CapacityLimits defines capacity boundaries for a priority band.
type CapacityLimits struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: since the key proposed is limits will capacity be reasonable in the future if for example we require rate specific config.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, many of the functions related to maxBytes logic are using capacity in their names. I think we can open a separate issue to rename them when appropriate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, the maxBytes here doesn't refer to the concurrency accepted by the backend, but rather the capacity of the manageQueue within each flow. From this perspective, 'capacity' is actually correct. When I implemented maxRequests in #2495 , I also calculated the number of items in the manageQueue.

@BizerNotNull BizerNotNull force-pushed the feat/expand-flow-control-schema branch from 9304893 to 0e54514 Compare March 6, 2026 02:16
Copy link
Copy Markdown
Contributor

@LukeAVanDrie LukeAVanDrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LukeAVanDrie
Copy link
Copy Markdown
Contributor

/assign @ahg-g

@BizerNotNull
Copy link
Copy Markdown
Contributor Author

@LukeAVanDrie Sure,I'd like

@BizerNotNull
Copy link
Copy Markdown
Contributor Author

Docs is done in 8442b98

@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Mar 14, 2026

/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, BizerNotNull, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 14, 2026
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Mar 14, 2026

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2026
@k8s-ci-robot k8s-ci-robot merged commit 7704d32 into kubernetes-sigs:main Mar 14, 2026
9 checks passed
BizerNotNull added a commit to BizerNotNull/gateway-api-inference-extension that referenced this pull request Mar 15, 2026
…e.Quantity) (kubernetes-sigs#2492)

* feat(conf): add config `Limits` in EndponintPickerConfig
- add struct CapacityLimits
- add Limits Config in PriorityBandConfig and FlowControlConfig

* feat(config): Modify the MaxBytes configuration logic within the `NewConfigFromAPI` function:
- Add New Logic: Implement logic to write the new configuration field `Limit.MaxBytes` to a `uint64` variable, including the necessary type conversion.
- Remove Old Logic: Delete the existing logic responsible for writing the legacy `MaxBytes` configuration.

* chore(conf): supplyment the comment for new conf

* feat(conf): delete the old config `MaxBytes`
- use the new conf for the logs print

* refactor(config): extract `resolveMaxBytes` helper and improve Limits comments

* feat(test): change the `MaxBytes` in test to `Limit.MaxBytes`

* feat(test): add `ShouldSucceed_WithKubernetesQuantityFormat` to test the k8s case

* chore(deepcopy): remake `deepcopy.go`

* revert(config): flat the `Limits`

* chore(make): after `make`

* feat(docs): supplyment flow control docs for the description of `quantity format`
elevran pushed a commit to llm-d/llm-d-inference-scheduler that referenced this pull request Apr 23, 2026
…e.Quantity) (kubernetes-sigs/gateway-api-inference-extension#2492)

* feat(conf): add config `Limits` in EndponintPickerConfig
- add struct CapacityLimits
- add Limits Config in PriorityBandConfig and FlowControlConfig

* feat(config): Modify the MaxBytes configuration logic within the `NewConfigFromAPI` function:
- Add New Logic: Implement logic to write the new configuration field `Limit.MaxBytes` to a `uint64` variable, including the necessary type conversion.
- Remove Old Logic: Delete the existing logic responsible for writing the legacy `MaxBytes` configuration.

* chore(conf): supplyment the comment for new conf

* feat(conf): delete the old config `MaxBytes`
- use the new conf for the logs print

* refactor(config): extract `resolveMaxBytes` helper and improve Limits comments

* feat(test): change the `MaxBytes` in test to `Limit.MaxBytes`

* feat(test): add `ShouldSucceed_WithKubernetesQuantityFormat` to test the k8s case

* chore(deepcopy): remake `deepcopy.go`

* revert(config): flat the `Limits`

* chore(make): after `make`

* feat(docs): supplyment flow control docs for the description of `quantity format`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/deprecation Categorizes issue or PR as related to a feature/enhancement marked for deprecation. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants