Tkv limit bug by Daniel-Schenker · Pull Request #954 · torch-spyre/sendnn-inference

Daniel-Schenker · 2026-04-29T22:28:12Z

Description

This PR addresses a bug in the scheduler code where the scheduler would allow requests that would eventually exceed the max batch tkv limit during an ongoing batch. The PR includes a new test case to reproduce the error and the fix for said error.

Related Issues

Test Plan

Run the test file with: pytest tests/v1/worker/test_scheduler_tkv_limits.py -sv
Second test case will pass with changes in sendnn_inference/v1/core/scheduler.py included in this PR, and will fail without.

Checklist

I have read the contributing guidelines
My code follows the project's code style (run bash format.sh)
I have added tests for my changes (if applicable)
I have updated the documentation (if applicable)
My commits include a Signed-off-by: line (DCO compliance)

github-actions · 2026-04-29T22:28:27Z

👋 Hi! Thank you for contributing.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

Daniel-Schenker · 2026-04-29T22:33:40Z

Initial proposed changes are here. More than happy to address any comments and add/change code and comments if things aren't clear.

joerunde · 2026-04-29T22:36:47Z

+    # Helper function to round up to the nearest block size
+    # Uses bitwise alignment for better performance
+    block_size = SpyrePlatform.get_block_size()
+    return (n + block_size - 1) & ~(block_size - 1)


I'm not sure we should make this generic, because the bitwise ops only work if the block size is a power of 2. What about hardcoding 64 here, and sticking an assert in the calling code that block_size == 64?

I have a reasonably low care amount because we never expect the block size to change, only a slight preference to not leave a foot-gun lying around.

joerunde · 2026-04-29T22:38:12Z

+    monkeypatch.setenv("VLLM_DT_MAX_BATCH_TKV_LIMIT", "131072")
+
+    # Define prompt lengths for first set of requests
+    prompt_lengths_1 = [


[1024] * 16?

or [1018] + [1024] * 15 if the first 1018 is relevant

Addressed, also realized that the second set of prompt lengths was identical to the first. Only difference between the 2 sets of requests is the max tokens so consolidated that as well.

joerunde · 2026-04-29T22:39:11Z


        # Compute the effective token length of the new request
-        new_req_max_tkv = new_req_tkv + request.max_tokens - 1
+        # Rounded up to the nearest block size


Should we mention here (and below) that we're rounding up to the end of the block to account for padding?

joerunde · 2026-04-29T22:42:59Z

+
+@pytest.mark.cpu
+@pytest.mark.chunked_prefill
+def test_scheduler_tkv_limits_ongoing_batch(monkeypatch: pytest.MonkeyPatch):


A high-level overview of what's going on here would be great.

IIUC this situation is:

We first schedule a 16x8k batch that would fully fill the 128k TKV limit

We then inject a batch of smaller requests partway through processing, which should be able to schedule only because they are guaranteed to finish processing just before the TKV is long enough to overrun the limit with the larger batch size

This flexes the logic for injecting shorter requests into a running batch, which is not tested by the other test in this file

Daniel-Schenker · 2026-04-29T23:08:04Z

Initial comments have been addressed. I have re-confirmed that the new test case fails without scheduler.py changes and passes with changes.

yannicks1

To me it looks like this bug was (at least partially) introduced in #913 (please correct me if I am wrong). Prior to 913 we passed n_blocks into the function which had information about the new request and did calculation based on this. While we fixed an edge case in 913, the new approach did not use n_blocks, hence the new request was not considered.

With this PR we consider the new request again while still fixing the edge case bug.

yannicks1 · 2026-04-30T14:19:21Z

+def round_up_to_block_size(n: int) -> int:
+    # Helper function to round up to the nearest block size
+    # Uses bitwise alignment for better performance
+    return (n + 63) & ~63


spicy stuff 🌶️

yannicks1 · 2026-04-30T15:00:37Z

n_blocks is unused, please remove

yannicks1 · 2026-04-30T15:11:49Z

Initial comments have been addressed. I have re-confirmed that the new test case fails without scheduler.py changes and passes with changes.

could you quickly run your newly added test (that captured the new failure) with a pre 913 commit? This would help me understand better...

Signed-off-by: Daniel Schenker <[email protected]>

yannicks1 · 2026-04-30T23:12:56Z

this needs to be gone too ..

Signed-off-by: Daniel Schenker <[email protected]>

joerunde

ship it!

Daniel-Schenker requested review from nikolaospapandreou, prashantgupta24, rafvasq, sducouedic, tdoublep and yannicks1 as code owners April 29, 2026 22:28

Daniel-Schenker force-pushed the tkv_limit_bug branch from b9c268d to 4344ffa Compare April 29, 2026 22:29

Daniel-Schenker requested a review from joerunde April 29, 2026 22:30

joerunde reviewed Apr 29, 2026

View reviewed changes

yannicks1 reviewed Apr 30, 2026

View reviewed changes

Daniel-Schenker added 6 commits April 30, 2026 16:24

Add test case to reproduce tkv bug during ongoing batch

407ddad

Signed-off-by: Daniel Schenker <[email protected]>

Introduce initial fix for tkv limit bug

44fa344

Signed-off-by: Daniel Schenker <[email protected]>

Add some comments

525ec8a

Signed-off-by: Daniel Schenker <[email protected]>

Formatting changes

e4a33b0

Signed-off-by: Daniel Schenker <[email protected]>

Address comments and ruff formatting

0fd53ea

Signed-off-by: Daniel Schenker <[email protected]>

Remove unused n_blocks arg from check_batch_tkv_limit_cp

9b0b50d

Signed-off-by: Daniel Schenker <[email protected]>

Daniel-Schenker force-pushed the tkv_limit_bug branch from 1cea8ad to 9b0b50d Compare April 30, 2026 21:24

yannicks1 mentioned this pull request Apr 30, 2026

[Feature]: Refactor check_batch_tkv_limit_cp into a standalone pure function #955

Open

1 task

ruff formatting

9305204

Signed-off-by: Daniel Schenker <[email protected]>

yannicks1 mentioned this pull request Apr 30, 2026

[Feature]: More optimal constraint checking (tkv x batch size) #932

Open

1 task

yannicks1 reviewed Apr 30, 2026

View reviewed changes

Remove n_blocks from function call

2561850

Signed-off-by: Daniel Schenker <[email protected]>

joerunde approved these changes May 4, 2026

View reviewed changes

joerunde merged commit 395d367 into torch-spyre:main May 4, 2026
11 checks passed

Conversation

Daniel-Schenker commented Apr 29, 2026

Description

Related Issues

Test Plan

Checklist

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Daniel-Schenker commented Apr 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Daniel-Schenker commented Apr 29, 2026

Uh oh!

yannicks1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yannicks1 commented Apr 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joerunde left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants