-
Notifications
You must be signed in to change notification settings - Fork 31
[CB] remove env var VLLM_SPYRE_ENABLE_PREFILL_OPTIMIZATION #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. Or this can be done with Now you are good to go 🚀 |
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
maxdebayser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this makes sense.
sducouedic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are still two tests with referring to prefill optimization:
test_requests_use_full_batch_tkv_limit_prefill_opt and test_requests_exceed_batch_tkv_limit_prefill_opt
|
good catch, will change the names of these tests. |
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
sducouedic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, definitively a no brainer to have the optimization enabled all the time
applying the tighter constraint for the max model length to the continuous batching scheduler too. this establishes parity between the chunked prefill and continuous batching constraints. see discussion [here](#562 (comment)) Signed-off-by: Yannick Schnider <[email protected]>
VLLM_SPYRE_ENABLE_PREFILL_OPTIMIZATION is on by default and we have not found any reason to ever turn it off.
reasons for removing: