Optimized MoE on Gaudi #159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

gyou2021 wants to merge 14 commits into HabanaAI:main from gyou2021:gyou/moe

gyou2021 commented Apr 18, 2025 •

edited

Loading

Optimized MoE on Gaudi. 2. Enabled EP on Gaudi.

=== update 05/13 ===
need refactoring current impl to work both bf16 and fp8 static moe

TODO:

move static moe as a new module
Not need to move it as a new module according to the consultation with the INC engineer.


          1. Optimized MoE on Gaudi. 2. Enabled EP on Gaudi.

2761ab8

Signed-off-by: gyou2021 <[email protected]>

gyou2021 requested review from afierka-intel, kzawora-intel, madamczyk-intel, michalkuligowski and tzielinski-habana as code owners

April 18, 2025 10:55

wpyszka approved these changes

View reviewed changes

wpyszka left a comment

lgtm


          Merge remote-tracking branch 'origin/main' into gyou/moe

c7f8752

gyou2021 requested a review from mgawarkiewicz-intel as a code owner

April 25, 2025 10:03

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Outdated Show resolved Hide resolved

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Show resolved Hide resolved

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Outdated Show resolved Hide resolved

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Show resolved Hide resolved


          Added set values methods for MoE.

c6d128d

Signed-off-by: gyou2021 <[email protected]>

gyou2021 requested a review from jikunshang as a code owner

May 1, 2025 16:12

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Outdated Show resolved Hide resolved

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Outdated Show resolved Hide resolved


          Fixed format.

321a69a

Signed-off-by: gyou2021 <[email protected]>

gyou2021 requested a review from mswiniarsk as a code owner

May 7, 2025 11:05


          Fixed merging conflicts.

a549bfb

Signed-off-by: gyou2021 <[email protected]>

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Outdated Show resolved Hide resolved

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Outdated Show resolved Hide resolved

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Outdated Show resolved Hide resolved

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Outdated

    
                          current_hidden_states_static = torch.matmul(

                              current_state_static, self.w2_weight.transpose(

                                  1, 2)) * padded_weights

                          final_hidden_states = current_hidden_states_static.sum(dim=0)

Contributor

xuechendi May 12, 2025

how about wrap static_moe as a function and invoke here?

Author

gyou2021 May 23, 2025

The INC engineer suggested not to wrap it.

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Outdated Show resolved Hide resolved

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Show resolved Hide resolved

xuechendi mentioned this pull request

[Qwen3] Enable on HPU HabanaAI/vllm-fork#1227

Closed

xuechendi marked this pull request as draft

May 13, 2025 15:07

gyou2021 added 2 commits

May 23, 2025 09:05


          Optimized MoE on Gaudi

71ff585

Signed-off-by: gyou2021 <[email protected]>


          Optimized MoE with static MoE on Gaudi

3ae83f3

Signed-off-by: gyou2021 <[email protected]>

gyou2021 added 5 commits

May 23, 2025 17:49


          Merge branch 'main' into gyou/moe

fd1e0eb


          Removed functions for INC based FP8 inference.

6e94680

Signed-off-by: gyou2021 <[email protected]>


          Merge branch 'gyou/moe' of https://github.com/gyou2021/vllm-hpu-exten…

4a7b82a

…sion into gyou/moe


          Modified the comment.

316f0e0

Signed-off-by: gyou2021 <[email protected]>


          rebased to b8a0e5 to be compatible with vllm-for dev/qwen3-habanamain…

9b60beb

…; added static moe.

Signed-off-by: gyou2021 <[email protected]>

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py Show resolved Hide resolved

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py

    
                                  final_hidden_states = slice_final_hidden_states

                              else:

                                  final_hidden_states += slice_final_hidden_states

                      else:

Contributor

xuechendi May 28, 2025

don't use else. wrap static moe into a function and do:
if ...:
return static_MOE()
#existing code
...

Author

gyou2021 Jun 4, 2025

Modified.

Contributor

xuechendi Jun 16, 2025

May you switch the sequence,

do:
if ...:
return static_MOE()

existing codes

dynamic_moe()

instead of:

if ...:
dynamic_moe()
else:
return statc

gyou2021 added 2 commits

June 4, 2025 09:37


          Refactor code.

f06441a

Signed-off-by: gyou2021 <[email protected]>


          fixed conflicts.

7bc3e28

Signed-off-by: gyou2021 <[email protected]>

xuechendi reviewed

View reviewed changes

vllm_hpu_extension/ops.py

    
                      # dynamic MoE is used since its performance is better than

                      # static MoE in this case.

                      self.dynamic_moe_min_tokens = int(

                      os.environ.get("VLLM_DYNAMIC_MOE_MIN_TOKENS", 256))

Contributor

xuechendi Jun 16, 2025

isn't this duplicated with 466 - 482?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

xuechendi xuechendi left review comments

wpyszka wpyszka approved these changes

kzawora-intel Awaiting requested review from kzawora-intel kzawora-intel is a code owner

madamczyk-intel Awaiting requested review from madamczyk-intel madamczyk-intel is a code owner

michalkuligowski Awaiting requested review from michalkuligowski michalkuligowski is a code owner

tzielinski-habana Awaiting requested review from tzielinski-habana tzielinski-habana is a code owner

afierka-intel Awaiting requested review from afierka-intel afierka-intel is a code owner

mgawarkiewicz-intel Awaiting requested review from mgawarkiewicz-intel mgawarkiewicz-intel is a code owner

jikunshang Awaiting requested review from jikunshang jikunshang is a code owner

mswiniarsk Awaiting requested review from mswiniarsk mswiniarsk is a code owner

deepvars Awaiting requested review from deepvars deepvars will be requested when the pull request is marked ready for review deepvars is a code owner

At least 1 approving review is required to merge this pull request.

Labels

None yet