Fix prefix caching is currently not supported with sliding window attention when using qwen1.5 #3377

carrey-feng · 2024-03-13T10:55:23Z

My model is trained using qwen1.5. When I use prefix caching to do offline batch inference, the program throws AssertionError: Prefix caching is currently not supported with sliding window attention. I noticed that in vllm/config.py, the get_sliding_window function does not take into account whether the value of use_sliding_window is set to true.
The use_sliding_window is a new configuration option introduced in Qwen 1.5 that controls whether to use a sliding window, and its default value is set to false. So I add a judgment for value of use_sliding_window when calling the get_sliding_window function. This is similar to the implementation in qwen2.py

The AssertionError error message is as follows

File "/test.py", line 45, in outputs = model.generate(prompts, sampling_params, prefix_pos=[prefix_pos] * len(prompts)) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 182, in generate return self._run_engine(use_tqdm) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 208, in _run_engine step_outputs = self.llm_engine.step() File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 838, in step all_outputs = self._run_workers( File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 1041, in _run_workers driver_worker_output = getattr(self.driver_worker, File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 223, in execute_model output = self.model_runner.execute_model(seq_group_metadata_list, File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 571, in execute_model lora_mapping) = self.prepare_input_tensors(seq_group_metadata_list) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 490, in prepare_input_tensors lora_requests) = self._prepare_prompt(seq_group_metadata_list) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 193, in _prepare_prompt assert prefix_len == 0, ( AssertionError: Prefix caching is currently not supported with sliding window attention

… using qwen1.5

cadedaniel

Thanks for the fix!

cadedaniel · 2024-03-13T20:56:41Z

vllm/config.py

    def get_sliding_window(self) -> Optional[int]:
-        return getattr(self.hf_config, "sliding_window", None)
+        return (getattr(self.hf_config, "sliding_window", None)
+                if self.get_use_sliding_window() else None)
+
+    def get_use_sliding_window(self) -> bool:
+        return getattr(self.hf_config, "use_sliding_window", False)


Can we remove the public function get_us_sliding_window unless we need it in the codebase?

Can we add some docstrings here?

Can we add a unit test for this functionality?

docstrings
Of course，I will submit unit tests and some docstrings later.

carrey-feng · 2024-03-14T08:01:16Z

@cadedaniel I have submitted unit tests and docstrings

…emory (OOM) issues.

cadedaniel · 2024-03-15T01:28:11Z

Hi @a516072575 , sorry it seems #3373 got in a little faster.

carrey-feng · 2024-03-15T04:44:29Z

Hi @a516072575 , sorry it seems #3373 got in a little faster.

Alright, it seems that #3373 indeed got ahead of me, but its unit test class doesn't appear to be invoked for execution.

cadedaniel · 2024-03-15T19:31:19Z

Good catch -- created #3437, can you review?

add a judgment for use_sliding_window when call get_sliding_window by…

bc0d507

… using qwen1.5

carrey-feng changed the title ~~Fix prefix caching is currently not supported with sliding window attention by using qwen1.5~~ Fix prefix caching is currently not supported with sliding window attention when using qwen1.5 Mar 13, 2024

carrey-feng mentioned this pull request Mar 13, 2024

AssertionError: Prefix caching is currently not supported with sliding window attention #3355

Closed

cadedaniel reviewed Mar 13, 2024

View reviewed changes

cadedaniel self-assigned this Mar 13, 2024

add unit test and docstrings

a608b83

Splitting the test into two separate test functions to avoid Out of M…

f2c58e8

…emory (OOM) issues.

carrey-feng requested a review from cadedaniel March 14, 2024 08:57

carrey-feng added 9 commits March 14, 2024 17:11

fix ruff fail

2e92bea

fix ruff fail

75feb28

fix ruff fail

be3917a

fix yapf fail

f32ba30

fix yapf fail

96b8b98

fix yapf fail

7a1261c

fix yapf fail

09ae3e3

fix yapf fail

fea1843

fix model name error

93707d7

carrey-feng closed this Mar 15, 2024

cadedaniel mentioned this pull request Mar 15, 2024

[Testing] Add test_config.py to CI #3437

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix prefix caching is currently not supported with sliding window attention when using qwen1.5 #3377

Fix prefix caching is currently not supported with sliding window attention when using qwen1.5 #3377

Uh oh!

carrey-feng commented Mar 13, 2024 •

edited

Loading

Uh oh!

cadedaniel left a comment

Uh oh!

cadedaniel Mar 13, 2024

Uh oh!

carrey-feng Mar 14, 2024

Uh oh!

carrey-feng commented Mar 14, 2024

Uh oh!

cadedaniel commented Mar 15, 2024

Uh oh!

carrey-feng commented Mar 15, 2024

Uh oh!

cadedaniel commented Mar 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix prefix caching is currently not supported with sliding window attention when using qwen1.5 #3377

Fix prefix caching is currently not supported with sliding window attention when using qwen1.5 #3377

Uh oh!

Conversation

carrey-feng commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cadedaniel left a comment

Choose a reason for hiding this comment

Uh oh!

cadedaniel Mar 13, 2024

Choose a reason for hiding this comment

Uh oh!

carrey-feng Mar 14, 2024

Choose a reason for hiding this comment

Uh oh!

carrey-feng commented Mar 14, 2024

Uh oh!

cadedaniel commented Mar 15, 2024

Uh oh!

carrey-feng commented Mar 15, 2024

Uh oh!

cadedaniel commented Mar 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

carrey-feng commented Mar 13, 2024 •

edited

Loading