Skip to content

Conversation

@yannicks1
Copy link
Collaborator

The previous comment was slightly misleading, as VLLM_SPYRE_WARMUP_NEW_TOKENS is an upper limit on the number of generated output tokens by the Spyre compiler for static batching, while (SamplingParams.)max_tokens is a user defined parameter which has to obey the compiler constraint for both static and continuous batching.

By rephrasing the continuous batching comment to:

The number of generated output tokens is implicitly limited by max-model-len - padded_prompt_length

both the static and continuous batching section now refer to the compiler side upper limit.

@yannicks1 yannicks1 requested a review from rafvasq as a code owner October 10, 2025 16:04
@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Copy link
Collaborator

@joerunde joerunde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lpgtm!

@yannicks1 yannicks1 merged commit 0afb9f6 into main Oct 10, 2025
19 checks passed
@yannicks1 yannicks1 deleted the ysc-revise-docs branch October 10, 2025 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants