[WIP] fix check_batch_tkv_limit_cp conservatively by yannicks1 · Pull Request #961 · torch-spyre/sendnn-inference

yannicks1 · 2026-05-08T10:17:02Z

added an extra block of slack (left-padding can push a sequence's runtime tkv up to one block past the scheduler's estimate when the batch re-aligns on admission) for both new and decode requests.

…nt for all edge cases Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>

github-actions · 2026-05-08T10:17:12Z

👋 Hi! Thank you for contributing.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

yannicks1 · 2026-05-08T10:17:40Z

bot:bench
NUM_PROMPTS=1000
MAX_RUN_TIME=36000
IGNORE_EOS=1
CUSTOM_OUTPUT_LEN=-1
MAX_CONCURRENT=4

yannicks1 · 2026-05-08T12:08:23Z

bot:bench
NUM_PROMPTS=1000
MAX_RUN_TIME=36000
IGNORE_EOS=1
CUSTOM_OUTPUT_LEN=-1
MAX_CONCURRENT=8

yannicks1 · 2026-05-08T12:14:46Z

bot:test

yannicks1 · 2026-05-08T13:42:10Z

bot:bench
NUM_PROMPTS=1000
MAX_RUN_TIME=36000
IGNORE_EOS=1
CUSTOM_OUTPUT_LEN=-1
MAX_CONCURRENT=8

joerunde · 2026-05-08T15:08:17Z

        new_req_max_tkv = round_up_to_block_size(new_req_tkv + request.max_tokens - 1)
+        # Extra block of slack: left-padding can push a sequence's runtime tkv up to
+        # one block past the scheduler's estimate when the batch re-aligns on admission.
+        new_req_max_tkv += self.block_size


dang, yeah my reasoning on only needing to round up was that the padding only happens when a request actually gets to the next block, so if our math is correct here then you'd only get to the next block if some request was going to end inside it, in which case that request would round up and cover all the cases of other requests getting padded to somewhere within that block.

But clearly that reasoning isn't sound 🙃

joerunde

ship it

check_batch_tkv_limit_cp: include a safety margin of 1 block to accou…

64ed921

…nt for all edge cases Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>

joerunde reviewed May 8, 2026

View reviewed changes

joerunde approved these changes May 8, 2026

View reviewed changes

joerunde marked this pull request as ready for review May 8, 2026 17:30

joerunde requested review from nikolaospapandreou, sducouedic and tdoublep as code owners May 8, 2026 17:30

sducouedic approved these changes May 8, 2026

View reviewed changes

sducouedic merged commit ce90aab into torch-spyre:main May 8, 2026
17 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] fix check_batch_tkv_limit_cp conservatively#961

[WIP] fix check_batch_tkv_limit_cp conservatively#961
sducouedic merged 1 commit intotorch-spyre:mainfrom
yannicks1:tkv-batch-check-safety-margin

yannicks1 commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

yannicks1 commented May 8, 2026

Uh oh!

yannicks1 commented May 8, 2026

Uh oh!

yannicks1 commented May 8, 2026

Uh oh!

yannicks1 commented May 8, 2026

Uh oh!

joerunde May 8, 2026

Uh oh!

joerunde left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yannicks1 commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

yannicks1 commented May 8, 2026

Uh oh!

yannicks1 commented May 8, 2026

Uh oh!

yannicks1 commented May 8, 2026

Uh oh!

yannicks1 commented May 8, 2026

Uh oh!

joerunde May 8, 2026

Choose a reason for hiding this comment

Uh oh!

joerunde left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants