-
Notifications
You must be signed in to change notification settings - Fork 31
test: a few test configuration updates to have chunked prefill tests pass on Spyre #588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. Or this can be done with Now you are good to go 🚀 |
| "use_cb": False, | ||
| "warmup_shapes": warmup_shapes, | ||
| }) | ||
| patch_environment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simplification!
| * number of prompts: 1 | ||
| * 0: len = 512, max tokens = 1, step joining = 0 | ||
| """ | ||
| # max_model_len=514 tests an edge case in the scheduler, but does not work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Maybe in sendnn the max model len has to be a multiple of 64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering something like that as well, but values like 576, 640, 768 also didn't work in my testing.
maxdebayser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fixes!
Signed-off-by: Travis Johnson <[email protected]>
| monkeypatch.setenv("VLLM_SPYRE_USE_CHUNKED_PREFILL", | ||
| "1" if use_chunked_prefill else "0") | ||
| # NB: setting this env var explicitly is needed to set the desired value for | ||
| # the chunk size in the case that granite 8b TP4 is detected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case wouldn't we also need to re-override the internal config for max_num_batched_tokens as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VLLM_DT_CHUNK_LEN currently takes precedence over user / vllm setting max_num_batch_tokens after the changes in #571
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very hot 🌶️
Description
In our internal CI we run unit tests on Spyre devices. This PR fixes some of the new Chunked Prefill tests to be able to pass on Spyre.
The two issues causing tests to fail:
test_single_cp_prefillcausing failure during inference withDtException: No matching compiler iter foundAlso found that having
@pytest.mark.cpuon all tests intest_spyre_cp_scheduler_steps.pyis incorrect (the mark is automatically applied from the backend parameterization).