-
Notifications
You must be signed in to change notification settings - Fork 31
Set default chunk size to 4k for granite 3 8b TP4 #571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Travis Johnson <[email protected]>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. Or this can be done with Now you are good to go 🚀 |
yannicks1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we assert that backend is Spyre? this would allow us to still pass different chunk sizes when testing on cpu.
good idea |
Signed-off-by: Travis Johnson <[email protected]>
030de79 to
5989718
Compare
I also changed the code to allow |
Description
Sets the default chunk size for granite 3 8b TP4 to the expected/supported value of 4096.
Note that using
--max-num-batched-tokensdoes not override this setting, but settingVLLM_DT_CHUNK_LENdirectly will take precedence.