Skip to content

flowcontrol: add benchmark suite#2539

Merged
k8s-ci-robot merged 2 commits intokubernetes-sigs:mainfrom
LukeAVanDrie:feat/flow-control-benchmarking
Mar 11, 2026
Merged

flowcontrol: add benchmark suite#2539
k8s-ci-robot merged 2 commits intokubernetes-sigs:mainfrom
LukeAVanDrie:feat/flow-control-benchmarking

Conversation

@LukeAVanDrie
Copy link
Copy Markdown
Contributor

@LukeAVanDrie LukeAVanDrie commented Mar 9, 2026

What type of PR is this?

/kind test

What this PR does / why we need it:

This PR introduces a comprehensive, steady-state benchmarking suite for the Flow Control layer. This suite is essential for establishing baseline performance metrics, identifying concurrency bottlenecks, and ensuring we prevent regressions as the system scales.

Results:

Results were collected on Intel(R) Xeon(R) CPU @ 2.20GHz using 8 cores for reference via:

go test -v -bench=. -run="^$" \            
  -benchtime=1s \
  -count=5 \
  -cpuprofile=baseline_cpu.prof \
  -memprofile=baseline_mem.prof \
  -blockprofile=baseline_block.prof \
  -mutexprofile=baseline_mutex.prof \
  ./pkg/epp/flowcontrol/benchmark/... | tee baseline.txt

The bulk of the testing is driven by BenchmarkFlowController_PerformanceMatrix which affixes and sweeps the following values, yielding a full performance hypercube:

  • Egress concurrency (modeling IFR-based saturation): {free-flow (always unsaturated), 1, 100}
  • # Shards (data parallelism): {1, 8}
  • # Priorities: {1, 8}
  • # Flows: {10, 50000}
  • Ingress concurrency (# concurrent requests): {10, 5000, 50000}

In short, the Flow Control layer is working efficiently. The dominant blocking factor is standard channel backpressure (selectgo) which is expected with our queuing strategy. There is minimal, localized lock contention. The primary limiter of raw macroscopic throughput is Garbage Collection overhead generated by allocations during IterateQueues and vending the FlowQueueAcessor. Both of these are clear, isolated targets for zero-allocation strategies. I have already fixed and validated this locally. I will send targeted performance improvements as followup PRs.

image image image image image image image image

Which issue(s) this PR fixes:
Part of #2087

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@LukeAVanDrie: The label(s) kind/test cannot be applied, because the repository doesn't have them.

Details

In response to this:

What type of PR is this?

/kind test

What this PR does / why we need it:

This PR introduces a comprehensive, steady-state benchmarking suite for the Flow Control layer. This suite is essential for establishing baseline performance metrics, identifying concurrency bottlenecks, and ensuring we prevent regressions as the system scales.

Which issue(s) this PR fixes:
Part of #2087

Does this PR introduce a user-facing change?:

NONE

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 9, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 9, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit a8b6b2d
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69b0a04e9816590008f7d22c
😎 Deploy Preview https://deploy-preview-2539--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 9, 2026
@LukeAVanDrie
Copy link
Copy Markdown
Contributor Author

/assign @kfswain

@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Mar 10, 2026

Great, unless I missed it, can you pls describe the details of the workload (request rate, number of flows, number of priorities) you benchmarked in the PR description?

@LukeAVanDrie
Copy link
Copy Markdown
Contributor Author

LukeAVanDrie commented Mar 10, 2026

Great, unless I missed it, can you pls describe the details of the workload (request rate, number of flows, number of priorities) you benchmarked in the PR description?

This is defined in benchmark_test.go and well-documented there. I run the full matrix of these values (affixing all and sweeping one), yielding the full performance hypercube. Full results are in the benchstat file linked in the PR description. Pprof data (tops and charts) is amortized across the full testing matrix, not for a specific configuration.

I just updated the PR description with this information.

@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Mar 10, 2026

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2026
Introduces a synchronous steady-state benchmark harness to evaluate the
Flow Control layer's algorithmic throughput and scalability.
@LukeAVanDrie LukeAVanDrie force-pushed the feat/flow-control-benchmarking branch from 7cbbe72 to a8b6b2d Compare March 10, 2026 22:50
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 10, 2026
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Mar 11, 2026

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 11, 2026
@k8s-ci-robot k8s-ci-robot merged commit 810a2f0 into kubernetes-sigs:main Mar 11, 2026
11 checks passed
BizerNotNull pushed a commit to BizerNotNull/gateway-api-inference-extension that referenced this pull request Mar 15, 2026
* flowcontrol: add benchmark suite

Introduces a synchronous steady-state benchmark harness to evaluate the
Flow Control layer's algorithmic throughput and scalability.

* rebase onto main
elevran pushed a commit to llm-d/llm-d-inference-scheduler that referenced this pull request Apr 23, 2026
…ce-extension#2539)

* flowcontrol: add benchmark suite

Introduces a synchronous steady-state benchmark harness to evaluate the
Flow Control layer's algorithmic throughput and scalability.

* rebase onto main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants