Skip to content

Conversation

@carrey-feng
Copy link

@carrey-feng carrey-feng commented Mar 13, 2024

My model is trained using qwen1.5. When I use prefix caching to do offline batch inference, the program throws AssertionError: Prefix caching is currently not supported with sliding window attention. I noticed that in vllm/config.py, the get_sliding_window function does not take into account whether the value of use_sliding_window is set to true.
The use_sliding_window is a new configuration option introduced in Qwen 1.5 that controls whether to use a sliding window, and its default value is set to false. So I add a judgment for value of use_sliding_window when calling the get_sliding_window function. This is similar to the implementation in qwen2.py

The AssertionError error message is as follows

File "/test.py", line 45, in outputs = model.generate(prompts, sampling_params, prefix_pos=[prefix_pos] * len(prompts)) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 182, in generate return self._run_engine(use_tqdm) File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 208, in _run_engine step_outputs = self.llm_engine.step() File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 838, in step all_outputs = self._run_workers( File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 1041, in _run_workers driver_worker_output = getattr(self.driver_worker, File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 223, in execute_model output = self.model_runner.execute_model(seq_group_metadata_list, File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 571, in execute_model lora_mapping) = self.prepare_input_tensors(seq_group_metadata_list) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 490, in prepare_input_tensors lora_requests) = self._prepare_prompt(seq_group_metadata_list) File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 193, in _prepare_prompt assert prefix_len == 0, ( AssertionError: Prefix caching is currently not supported with sliding window attention

@carrey-feng carrey-feng changed the title Fix prefix caching is currently not supported with sliding window attention by using qwen1.5 Fix prefix caching is currently not supported with sliding window attention when using qwen1.5 Mar 13, 2024
Copy link
Collaborator

@cadedaniel cadedaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

vllm/config.py Outdated
Comment on lines 234 to 239
def get_sliding_window(self) -> Optional[int]:
return getattr(self.hf_config, "sliding_window", None)
return (getattr(self.hf_config, "sliding_window", None)
if self.get_use_sliding_window() else None)

def get_use_sliding_window(self) -> bool:
return getattr(self.hf_config, "use_sliding_window", False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can we remove the public function get_us_sliding_window unless we need it in the codebase?
  • Can we add some docstrings here?
  • Can we add a unit test for this functionality?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • docstrings
    Of course,I will submit unit tests and some docstrings later.

@cadedaniel cadedaniel self-assigned this Mar 13, 2024
@carrey-feng
Copy link
Author

@cadedaniel I have submitted unit tests and docstrings

@carrey-feng carrey-feng requested a review from cadedaniel March 14, 2024 08:57
@cadedaniel
Copy link
Collaborator

Hi @a516072575 , sorry it seems #3373 got in a little faster.

@carrey-feng
Copy link
Author

Hi @a516072575 , sorry it seems #3373 got in a little faster.

Alright, it seems that #3373 indeed got ahead of me, but its unit test class doesn't appear to be invoked for execution.

@cadedaniel
Copy link
Collaborator

Good catch -- created #3437, can you review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants