-
Notifications
You must be signed in to change notification settings - Fork 54
Exceed TKV Limit Unit Test and Bugfix #913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+86
−4
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
d9f1e96
Add tkv limit unit test
Daniel-Schenker 7a68e4c
Remove print
Daniel-Schenker 9590ea2
Actually remove print
Daniel-Schenker 199ca7d
Potential fix in scheduler.py
Daniel-Schenker 19ca7f9
Clean up unit test and add comments
Daniel-Schenker b8b9141
Format fixes
Daniel-Schenker a7bbdc3
Update comments and format
Daniel-Schenker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| """ | ||
| Tests for scheduler TKV limit enforcement. | ||
|
|
||
| This module tests the chunked prefill scheduler's ability to correctly | ||
| enforce the max_batch_tkv_limit constraint. | ||
| """ | ||
|
|
||
| import pytest | ||
| from scheduling_utils import create_request_for_scheduler_test, random_prompt | ||
| from v1.worker.mock_model import InstrumentedModelRunner | ||
| from spyre_util import REFERENCE_MODELS | ||
| from vllm_spyre.platform import SpyrePlatform | ||
|
|
||
|
|
||
| @pytest.mark.cpu | ||
| @pytest.mark.chunked_prefill | ||
| def test_scheduler_tkv_limits(monkeypatch: pytest.MonkeyPatch): | ||
| """ | ||
| Test that the scheduler correctly enforces the TKV limit constraint. | ||
|
|
||
| The test creates 5 requests with varying prompt lengths and a large | ||
| max_tokens value, then runs the scheduler to completion. With the bug | ||
| present, the scheduler incorrectly accepts a batch configuration that | ||
| violates the TKV limit of 131072. | ||
|
|
||
| Expected behavior (when bug is fixed): | ||
| - Scheduler should hold back the next request in queue | ||
| until it is guaranteed not to violate TKV limits later | ||
| - Test should pass without exceeding hardware constraints | ||
|
|
||
| Current behavior (with bug): | ||
| - Scheduler accepts invalid batch configurations | ||
| - Test will fail with assertion errors | ||
| """ | ||
| # Setup: Use the default test model | ||
| model = REFERENCE_MODELS[InstrumentedModelRunner.DEFAULT_TEST_MODEL] | ||
|
|
||
| # Build model runner with specific constraints | ||
| model_runner = InstrumentedModelRunner.build( | ||
| monkeypatch=monkeypatch, | ||
| max_num_batched_tokens=512, | ||
| max_num_seqs=32, | ||
| max_model_len=32768, | ||
| available_blocks=32768, | ||
| ) | ||
|
|
||
| # Configure the TKV limit | ||
| scheduler = model_runner.scheduler | ||
| scheduler.max_batch_tkv_limit = 131072 | ||
| SpyrePlatform._max_batch_tkv_limit = 131072 | ||
| monkeypatch.setenv("VLLM_DT_MAX_BATCH_TKV_LIMIT", "131072") | ||
|
|
||
| # Define prompt lengths for each request | ||
| # This combination is designed to trigger the TKV limit bug | ||
| prompt_lengths = [940, 412, 969, 949, 11946] | ||
| max_tokens = 16384 | ||
|
|
||
| # Create and add all requests to the scheduler | ||
| requests = [] | ||
| for request_id, prompt_length in enumerate(prompt_lengths): | ||
| prompt = random_prompt(model=model, seed=0, length=prompt_length) | ||
| request = create_request_for_scheduler_test( | ||
| model=model, | ||
| request_id=request_id, | ||
| add_step=0, | ||
| max_tokens=max_tokens, | ||
| prompt=prompt, | ||
| use_golden_token_injection=False, | ||
| generate_hf_results=False, | ||
| ).request | ||
| requests.append(request) | ||
| scheduler.add_request(request) | ||
|
|
||
| # Run the scheduler loop until all requests complete | ||
| # With the bug present, the scheduler will incorrectly accept a batch | ||
| # configuration that exceeds the TKV limit | ||
| while True: | ||
| sched_output = scheduler.schedule() | ||
| output = model_runner.execute_model(sched_output) | ||
| scheduler.update_from_output(sched_output, output) | ||
| if len(scheduler.running) == 0: | ||
| break | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this having an impact here, but maybe set this to 8192 as this is the max number of blocks we currently support? 32K blocks (of 64 tokens) seems like a looot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we did actually have
available_blocks=8192at first... But there was an error stating that 16385 blocks were needed. Maybe @joerunde can comment further on this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, though we ended up with some startup errors because the max batch tkv limit wasn't being respected, which ended up crashing because we calculated a required 16k blocks 😢
We should probably understand why the env patch wasn't getting picked up, but that also doesn't need t block getting this in