Skip to content

Conversation

@tjohnson31415
Copy link
Collaborator

@tjohnson31415 tjohnson31415 commented Dec 4, 2025

Description

In our internal CI we run unit tests on Spyre devices. This PR fixes some of the new Chunked Prefill tests to be able to pass on Spyre.

The two issues causing tests to fail:

  • chunk size greater than max model length (due to vLLM default of 2048 or granite TP4 detection setting it to 4096)
  • some edge case I don't fully understand in test_single_cp_prefill causing failure during inference with DtException: No matching compiler iter found

Also found that having @pytest.mark.cpu on all tests in test_spyre_cp_scheduler_steps.py is incorrect (the mark is automatically applied from the backend parameterization).

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

"use_cb": False,
"warmup_shapes": warmup_shapes,
})
patch_environment(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification!

* number of prompts: 1
* 0: len = 512, max tokens = 1, step joining = 0
"""
# max_model_len=514 tests an edge case in the scheduler, but does not work
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Maybe in sendnn the max model len has to be a multiple of 64

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering something like that as well, but values like 576, 640, 768 also didn't work in my testing.

Copy link
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fixes!

monkeypatch.setenv("VLLM_SPYRE_USE_CHUNKED_PREFILL",
"1" if use_chunked_prefill else "0")
# NB: setting this env var explicitly is needed to set the desired value for
# the chunk size in the case that granite 8b TP4 is detected
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case wouldn't we also need to re-override the internal config for max_num_batched_tokens as well?

Copy link
Collaborator Author

@tjohnson31415 tjohnson31415 Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VLLM_DT_CHUNK_LEN currently takes precedence over user / vllm setting max_num_batch_tokens after the changes in #571

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very hot 🌶️

@tjohnson31415 tjohnson31415 merged commit ef47741 into main Dec 4, 2025
18 of 19 checks passed
@tjohnson31415 tjohnson31415 deleted the test-cp-spyre branch December 4, 2025 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants