Skip to content

Conversation

@gyou2021
Copy link

@gyou2021 gyou2021 commented Apr 18, 2025

  1. Optimized MoE on Gaudi. 2. Enabled EP on Gaudi.

=== update 05/13 ===
need refactoring current impl to work both bf16 and fp8 static moe

TODO:

  1. move static moe as a new module
    Not need to move it as a new module according to the consultation with the INC engineer.

Copy link

@wpyszka wpyszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@gyou2021 gyou2021 requested a review from jikunshang as a code owner May 1, 2025 16:12
Signed-off-by: gyou2021 <[email protected]>
@gyou2021 gyou2021 requested a review from mswiniarsk as a code owner May 7, 2025 11:05
current_hidden_states_static = torch.matmul(
current_state_static, self.w2_weight.transpose(
1, 2)) * padded_weights
final_hidden_states = current_hidden_states_static.sum(dim=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about wrap static_moe as a function and invoke here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The INC engineer suggested not to wrap it.

@xuechendi xuechendi marked this pull request as draft May 13, 2025 15:07
final_hidden_states = slice_final_hidden_states
else:
final_hidden_states += slice_final_hidden_states
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use else. wrap static moe into a function and do:
if ...:
return static_MOE()
#existing code
...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May you switch the sequence,

do:
if ...:
return static_MOE()

existing codes

dynamic_moe()


instead of:

if ...:
dynamic_moe()
else:
return statc

gyou2021 added 2 commits June 4, 2025 09:37
Signed-off-by: gyou2021 <[email protected]>
Signed-off-by: gyou2021 <[email protected]>
# dynamic MoE is used since its performance is better than
# static MoE in this case.
self.dynamic_moe_min_tokens = int(
os.environ.get("VLLM_DYNAMIC_MOE_MIN_TOKENS", 256))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this duplicated with 466 - 482?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants