[Bugfix] neuron: enable tensor parallelism #7562

omrishiv · 2024-08-15T17:29:40Z

FILL IN THE PR DESCRIPTION HERE

Following #7175, this enables tensor parallelism on neuron with vLLM > 0.5.0. The block-size choices are also reverted back to the previous values and vLLM will take the max-model-len as the block-size on neuron devices.

Signed-off-by: omrishiv <[email protected]>

github-actions · 2024-08-15T17:29:50Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

simon-mo

stamping args_utils changes.

Signed-off-by: omrishiv <[email protected]>

liangfu

Thanks @omrishiv for fixing the PP issue. The change looks good overall. I left the comment for a minor change.

liangfu · 2024-08-22T20:48:02Z

vllm/worker/neuron_worker.py

        self.is_driver_worker = True

    def init_device(self) -> None:
+        os.environ["PJRT_DEVICE"] = "NEURON"


Help take this out?

omrishiv · 2024-08-24T00:34:20Z

@liangfu Thank you, leftover from testing. Removed

Signed-off-by: omrishiv <[email protected]> Signed-off-by: Alvant <[email protected]>

Signed-off-by: omrishiv <[email protected]> Signed-off-by: LeiWang1999 <[email protected]>

enable tensor parallelism

b302b34

Signed-off-by: omrishiv <[email protected]>

simon-mo approved these changes Aug 15, 2024

View reviewed changes

simon-mo requested a review from liangfu August 15, 2024 17:30

omrishiv added 4 commits August 15, 2024 10:37

formatting

74dc724

Signed-off-by: omrishiv <[email protected]>

formatting

f648cc9

Signed-off-by: omrishiv <[email protected]>

formatting

b31f3db

Signed-off-by: omrishiv <[email protected]>

block_size should equal max_model_len on neuron, not max_num_seqs

5b734fb

Signed-off-by: omrishiv <[email protected]>

WoosukKwon added the aws-neuron Related to AWS Inferentia & Trainium label Aug 19, 2024

liangfu mentioned this pull request Aug 19, 2024

[Bug]: AssertionError in neuron_model_runner.py assert len(block_table) == 1 #4553

Closed

liangfu approved these changes Aug 23, 2024

View reviewed changes

Remove XLA variable

ea71d10

removing unused import

6cfdae3

simon-mo enabled auto-merge (squash) August 24, 2024 06:03

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 24, 2024

auto-merge was automatically disabled August 26, 2024 18:12
Head branch was pushed to by a user without write access

omrishiv force-pushed the neuron-tensor-parallelism branch from b00c3fe to 6cfdae3 Compare August 26, 2024 18:18

Merge branch 'vllm-project:main' into neuron-tensor-parallelism

0ed98d7

ywang96 merged commit 760e9f7 into vllm-project:main Aug 26, 2024

omrishiv deleted the neuron-tensor-parallelism branch August 26, 2024 22:13

omrishiv mentioned this pull request Aug 30, 2024

[Bugfix] update neuron for version > 0.5.0 #7175

Merged

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Bugfix] neuron: enable tensor parallelism (vllm-project#7562)

f908fd9

Signed-off-by: omrishiv <[email protected]> Signed-off-by: Alvant <[email protected]>

AlpinDale mentioned this pull request Dec 18, 2024

neuron: add support for tensor parallelism aphrodite-engine/aphrodite-engine#923

Merged

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025

[Bugfix] neuron: enable tensor parallelism (vllm-project#7562)

5922c33

Signed-off-by: omrishiv <[email protected]> Signed-off-by: LeiWang1999 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] neuron: enable tensor parallelism #7562

[Bugfix] neuron: enable tensor parallelism #7562

Uh oh!

omrishiv commented Aug 15, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Aug 15, 2024

Uh oh!

simon-mo left a comment

Uh oh!

liangfu left a comment

Uh oh!

liangfu Aug 22, 2024

Uh oh!

omrishiv commented Aug 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[Bugfix] neuron: enable tensor parallelism #7562

[Bugfix] neuron: enable tensor parallelism #7562

Uh oh!

Conversation

omrishiv commented Aug 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 15, 2024

Uh oh!

simon-mo left a comment

Choose a reason for hiding this comment

Uh oh!

liangfu left a comment

Choose a reason for hiding this comment

Uh oh!

liangfu Aug 22, 2024

Choose a reason for hiding this comment

Uh oh!

omrishiv commented Aug 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

omrishiv commented Aug 15, 2024 •

edited

Loading