-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[BugFix] Fix async scheduling + chunked prefill + preemption #28787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix async scheduling + chunked prefill + preemption #28787
Conversation
Signed-off-by: Nick Hill <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug related to async scheduling, chunked prefill, and preemption. The core of the fix is in vllm/v1/core/sched/scheduler.py, where previously only a slice of token IDs was being sent for requests undergoing chunked prefill. This was incorrect as the connector requires the full list of tokens for prefix matching. The change correctly sends a full copy of the token IDs. To facilitate this, a copy() method was added to the ConstantList class. The re-enabling of previously skipped tests in tests/v1/e2e/test_async_scheduling.py validates the fix. The changes are correct, well-targeted, and I have no further recommendations.
…oject#28787) Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Bram Wasti <[email protected]>
…oject#28787) Signed-off-by: Nick Hill <[email protected]>
…oject#28787) Signed-off-by: Nick Hill <[email protected]>
…oject#28787) Signed-off-by: Nick Hill <[email protected]>
…oject#28787) Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Xingyu Liu <[email protected]>
Small fix, took a while to track down 😅
The populated
all_token_idsshould never have been truncated since they are used to recover output ids following preemption in the async scheduling case.