Skip to content

Exceed TKV Limit Unit Test and Bugfix#913

Merged
joerunde merged 7 commits into
torch-spyre:mainfrom
Daniel-Schenker:mistral_tkv_bug
Apr 14, 2026
Merged

Exceed TKV Limit Unit Test and Bugfix#913
joerunde merged 7 commits into
torch-spyre:mainfrom
Daniel-Schenker:mistral_tkv_bug

Conversation

@Daniel-Schenker
Copy link
Copy Markdown
Collaborator

@Daniel-Schenker Daniel-Schenker commented Apr 13, 2026

Description

This PR adds a unit test case to reproduce [redacted]
A potential fix for the bug is also included in this PR.

Related Issues

Relates to [redacted]

Test Plan

Utilize existing and newly added vllm-spyre unit tests

Checklist

  • I have read the contributing guidelines
  • My code follows the project's code style (run bash format.sh)
  • I have added tests for my changes (if applicable)
  • I have updated the documentation (if applicable)
  • My commits include a Signed-off-by: line (DCO compliance)

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, run ./format.sh.
Now you are good to go 🚀.

We also recommend installing prek and configuring it to check your code before every local commit.

@Daniel-Schenker Daniel-Schenker changed the title Add tkv limit unit test Exceed TKV Limit Unit Test and Bugfix Apr 13, 2026
@Daniel-Schenker Daniel-Schenker self-assigned this Apr 13, 2026
@joerunde
Copy link
Copy Markdown
Collaborator

bot:test
MARKERS="spyre and not quantized"

Comment thread vllm_spyre/v1/core/scheduler.py Outdated
max_num_batched_tokens=512,
max_num_seqs=32,
max_model_len=32768,
available_blocks=32768,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this having an impact here, but maybe set this to 8192 as this is the max number of blocks we currently support? 32K blocks (of 64 tokens) seems like a looot.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we did actually have available_blocks=8192 at first... But there was an error stating that 16385 blocks were needed. Maybe @joerunde can comment further on this.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, though we ended up with some startup errors because the max batch tkv limit wasn't being respected, which ended up crashing because we calculated a required 16k blocks 😢

We should probably understand why the env patch wasn't getting picked up, but that also doesn't need t block getting this in

Comment thread tests/v1/worker/test_scheduler_tkv_limits.py Outdated
@prashantgupta24
Copy link
Copy Markdown
Collaborator

@Daniel-Schenker it looks like we have internal GH links in the PR description. We were warned not to do that

@joerunde
Copy link
Copy Markdown
Collaborator

^^ good catch @prashantgupta24!

@Daniel-Schenker
Copy link
Copy Markdown
Collaborator Author

@joerunde @yannicks1 I have cleaned up the test case a bit and added some (hopefully) helpful comments as this is pretty niche test. Let me know if it looks good.

@Daniel-Schenker
Copy link
Copy Markdown
Collaborator Author

I confirmed that the test was still failing without scheduler changes and passing after changes.

Signed-off-by: Daniel Schenker <[email protected]>
Signed-off-by: Daniel Schenker <[email protected]>
Signed-off-by: Daniel Schenker <[email protected]>
Signed-off-by: Daniel Schenker <[email protected]>
Signed-off-by: Daniel Schenker <[email protected]>
Comment thread tests/v1/worker/test_scheduler_tkv_limits.py Outdated
Signed-off-by: Daniel Schenker <[email protected]>
@Daniel-Schenker
Copy link
Copy Markdown
Collaborator Author

Comments have been addressed.

@joerunde joerunde enabled auto-merge (squash) April 14, 2026 20:16
@github-actions github-actions Bot added the ready Runs the full CI test suite. Only add to PRs once ready to merge to limit public GHA usage label Apr 14, 2026
Copy link
Copy Markdown
Collaborator

@joerunde joerunde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it!

@joerunde joerunde merged commit 83d5ccc into torch-spyre:main Apr 14, 2026
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready Runs the full CI test suite. Only add to PRs once ready to merge to limit public GHA usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants