Skip to content

Conversation

@yannicks1
Copy link
Collaborator

VLLM_SPYRE_ENABLE_PREFILL_OPTIMIZATION is on by default and we have not found any reason to ever turn it off.

reasons for removing:

  • getting rid of dead code -> simplifying scheduler constraints
  • being consistent with chunked prefill scheduler: the variable is not used there
  • reducing GHA time by removing tests targeting this optimization specifically

@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
Copy link
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this makes sense.

Copy link
Collaborator

@sducouedic sducouedic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are still two tests with referring to prefill optimization:

test_requests_use_full_batch_tkv_limit_prefill_opt and test_requests_exceed_batch_tkv_limit_prefill_opt

@yannicks1
Copy link
Collaborator Author

good catch, will change the names of these tests.

Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
@yannicks1 yannicks1 enabled auto-merge (squash) November 19, 2025 20:37
@github-actions github-actions bot added the ready Runs the full CI test suite. Only add to PRs once ready to merge to limit public GHA usage label Nov 19, 2025
Copy link
Collaborator

@sducouedic sducouedic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, definitively a no brainer to have the optimization enabled all the time

@yannicks1 yannicks1 merged commit 087d75f into main Nov 19, 2025
21 of 42 checks passed
@yannicks1 yannicks1 deleted the ysc-remove-prefill-opt-var branch November 19, 2025 20:49
yannicks1 added a commit that referenced this pull request Nov 20, 2025
applying the tighter constraint for the max model length to the
continuous batching scheduler too.
this establishes parity between the chunked prefill and continuous
batching constraints.

see discussion
[here](#562 (comment))

Signed-off-by: Yannick Schnider <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready Runs the full CI test suite. Only add to PRs once ready to merge to limit public GHA usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants