Tkv limit bug#954
Conversation
|
👋 Hi! Thank you for contributing. We also recommend installing prek and configuring it to check your code before every local commit. |
b9c268d to
4344ffa
Compare
|
Initial proposed changes are here. More than happy to address any comments and add/change code and comments if things aren't clear. |
| # Helper function to round up to the nearest block size | ||
| # Uses bitwise alignment for better performance | ||
| block_size = SpyrePlatform.get_block_size() | ||
| return (n + block_size - 1) & ~(block_size - 1) |
There was a problem hiding this comment.
I'm not sure we should make this generic, because the bitwise ops only work if the block size is a power of 2. What about hardcoding 64 here, and sticking an assert in the calling code that block_size == 64?
I have a reasonably low care amount because we never expect the block size to change, only a slight preference to not leave a foot-gun lying around.
| monkeypatch.setenv("VLLM_DT_MAX_BATCH_TKV_LIMIT", "131072") | ||
|
|
||
| # Define prompt lengths for first set of requests | ||
| prompt_lengths_1 = [ |
There was a problem hiding this comment.
[1024] * 16?
or [1018] + [1024] * 15 if the first 1018 is relevant
There was a problem hiding this comment.
Addressed, also realized that the second set of prompt lengths was identical to the first. Only difference between the 2 sets of requests is the max tokens so consolidated that as well.
|
|
||
| # Compute the effective token length of the new request | ||
| new_req_max_tkv = new_req_tkv + request.max_tokens - 1 | ||
| # Rounded up to the nearest block size |
There was a problem hiding this comment.
Should we mention here (and below) that we're rounding up to the end of the block to account for padding?
|
|
||
| @pytest.mark.cpu | ||
| @pytest.mark.chunked_prefill | ||
| def test_scheduler_tkv_limits_ongoing_batch(monkeypatch: pytest.MonkeyPatch): |
There was a problem hiding this comment.
A high-level overview of what's going on here would be great.
IIUC this situation is:
- We first schedule a 16x8k batch that would fully fill the 128k TKV limit
- We then inject a batch of smaller requests partway through processing, which should be able to schedule only because they are guaranteed to finish processing just before the TKV is long enough to overrun the limit with the larger batch size
- This flexes the logic for injecting shorter requests into a running batch, which is not tested by the other test in this file
|
Initial comments have been addressed. I have re-confirmed that the new test case fails without scheduler.py changes and passes with changes. |
yannicks1
left a comment
There was a problem hiding this comment.
To me it looks like this bug was (at least partially) introduced in #913 (please correct me if I am wrong). Prior to 913 we passed n_blocks into the function which had information about the new request and did calculation based on this. While we fixed an edge case in 913, the new approach did not use n_blocks, hence the new request was not considered.
With this PR we consider the new request again while still fixing the edge case bug.
| def round_up_to_block_size(n: int) -> int: | ||
| # Helper function to round up to the nearest block size | ||
| # Uses bitwise alignment for better performance | ||
| return (n + 63) & ~63 |
There was a problem hiding this comment.
n_blocks is unused, please remove
could you quickly run your newly added test (that captured the new failure) with a pre 913 commit? This would help me understand better... |
Signed-off-by: Daniel Schenker <[email protected]>
Signed-off-by: Daniel Schenker <[email protected]>
Signed-off-by: Daniel Schenker <[email protected]>
Signed-off-by: Daniel Schenker <[email protected]>
Signed-off-by: Daniel Schenker <[email protected]>
Signed-off-by: Daniel Schenker <[email protected]>
1cea8ad to
9b0b50d
Compare
Signed-off-by: Daniel Schenker <[email protected]>
There was a problem hiding this comment.
this needs to be gone too ..
Signed-off-by: Daniel Schenker <[email protected]>
Description
This PR addresses a bug in the scheduler code where the scheduler would allow requests that would eventually exceed the max batch tkv limit during an ongoing batch. The PR includes a new test case to reproduce the error and the fix for said error.
Related Issues
Test Plan
Run the test file with:
pytest tests/v1/worker/test_scheduler_tkv_limits.py -svSecond test case will pass with changes in
sendnn_inference/v1/core/scheduler.pyincluded in this PR, and will fail without.Checklist
bash format.sh)Signed-off-by:line (DCO compliance)