Skip to content

Conversation

@ILikeIneine
Copy link
Collaborator

Purpose

This PR is for supporting vllm v0.11.1

Test Plan

Test Result

(Optional) Documentation Update


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

@ILikeIneine ILikeIneine self-assigned this Oct 27, 2025
@ILikeIneine ILikeIneine marked this pull request as draft October 27, 2025 07:52
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the codebase to support vllm v0.11.1, which involves significant refactoring around memory allocation, platform integration, and attention mechanisms. The changes appear to align with the goal of supporting the new vllm version. I have found one critical issue in the device allocator patch that could lead to a runtime error and have provided a fix.

Comment on lines +48 to +53
if len(self._sleep_saved_buffers):
model = self.model_runner.model
for name, buffer in model.named_buffers():
if name in self._sleep_saved_buffers:
buffer.data.copy_(self._sleep_saved_buffers[name].data)
self._sleep_saved_buffers = {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a potential AttributeError here. The self._sleep_saved_buffers attribute is only initialized within the sleep method, and only when level == 2. If wake_up is called after sleep(level=1) or before any call to sleep, self._sleep_saved_buffers will not exist on the object, causing a crash when len() is called on it.

To prevent this, you should safely check for the attribute's existence before trying to access it.

Suggested change
if len(self._sleep_saved_buffers):
model = self.model_runner.model
for name, buffer in model.named_buffers():
if name in self._sleep_saved_buffers:
buffer.data.copy_(self._sleep_saved_buffers[name].data)
self._sleep_saved_buffers = {}
if hasattr(self, "_sleep_saved_buffers") and self._sleep_saved_buffers:
model = self.model_runner.model
for name, buffer in model.named_buffers():
if name in self._sleep_saved_buffers:
buffer.data.copy_(self._sleep_saved_buffers[name].data)
self._sleep_saved_buffers = {}

@leex404 leex404 force-pushed the support-vllm-0.11.1 branch 2 times, most recently from ab31312 to f516af8 Compare November 4, 2025 08:35
ILikeIneine and others added 18 commits November 7, 2025 14:09
Signed-off-by: Hank <[email protected]>
Signed-off-by: Hank <[email protected]>
Signed-off-by: Hank <[email protected]>
…l` (#115)

* [fix] fix sample_recovered_tokens_kernel use too much private memory

Signed-off-by: Xin Li <[email protected]>

* [fix] fix type error in bf16_paged_mqa_logits

Signed-off-by: Xin Li <[email protected]>

* [chore] change file directory

Signed-off-by: Xin Li <[email protected]>

---------

Signed-off-by: Xin Li <[email protected]>
Co-authored-by: Xin Li <[email protected]>

Signed-off-by: leex404 <[email protected]>
@ILikeIneine ILikeIneine marked this pull request as ready for review November 14, 2025 02:09
@ILikeIneine ILikeIneine merged commit 0a392da into master Nov 14, 2025
2 of 4 checks passed
@ILikeIneine ILikeIneine changed the title [WIP] support v0.11.1 feat!: support v0.11.1 Nov 14, 2025
@ILikeIneine ILikeIneine deleted the support-vllm-0.11.1 branch November 24, 2025 07:26
@ILikeIneine ILikeIneine restored the support-vllm-0.11.1 branch November 24, 2025 07:26
@ILikeIneine ILikeIneine deleted the support-vllm-0.11.1 branch December 1, 2025 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants