-
Notifications
You must be signed in to change notification settings - Fork 31
fix: logits processors for CB #484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Wallas Santos <[email protected]>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. Or this can be done with Now you are good to go 🚀 |
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
|
bot:test |
tests/e2e/test_sampling_params.py
Outdated
|
|
||
| pytestmark = [pytest.mark.full_model, pytest.mark.other_e2e] | ||
|
|
||
| # TODO: REVERT THIS CHANGE! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we parametrize the test such that they get executed for SB and CB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do we already have a similar test for CB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not have this test for CB. I think the issue here is increase too much the time of CI. Moreover, they do not repro very well the issue of this PR.
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
|
bot:test |
maxdebayser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ok as a stopgap solution, but we should think about a more comprehensive solution later when more of the sampling params are implemented with logits processors. There could be performance implications in applying the LPs per requests vs per batch.
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
|
bot:test |
1 similar comment
|
bot:test |
…-fix-cb-logits-processors
|
bot:test |
…-fix-cb-logits-processors
fix: upgrade aftu Signed-off-by: Wallas Santos <[email protected]>
…ject/vllm-spyre into wallas-fix-cb-logits-processors
|
bot:test |
|
bot:test |
Description
This PR introduces a wrapper of the logits processor that are injected to vlllm that will handle the logits processor in a distributed way. The wrapper is initialized with the logits class and the batch_size. So, from the logits processor perspective it will "think" that it's only handling a request per step, while the wrapper receives the batch of logits, slice and redistribute for each request.