Skip to content

Conversation

@youkaichao
Copy link
Member

Prior to this PR, broadcast_tensor_dict can only work for cuda tensor.

This PR enables both cuda tensor and cpu tensor for broadcast_tensor_dict.

It will be useful when we have some metadata in cpu tensor, e.g. blocks_to_swap_in and blocks_to_swap_out to be introduced in #4659 .

Note: blocks_to_copy is still a cuda tensor, because the src and target for copy both lives in GPU, and we have a dedicated copy kernel for it. blocks_to_swap_in and blocks_to_swap_out has to be cpu tensor, because they are kernel launch arguments.

@youkaichao youkaichao requested a review from zhuohan123 May 7, 2024 22:18
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@youkaichao youkaichao merged commit cc466a3 into vllm-project:main May 8, 2024
@youkaichao youkaichao deleted the split_broadcast branch May 8, 2024 02:36
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 8, 2024
…-project#4660)

[Core][Distributed] support both cpu and device tensor in broadcast tensor dict (vllm-project#4660)
robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 19, 2024
…-project#4660)

[Core][Distributed] support both cpu and device tensor in broadcast tensor dict (vllm-project#4660)
dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024
…-project#4660)

[Core][Distributed] support both cpu and device tensor in broadcast tensor dict (vllm-project#4660)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants