-
Notifications
You must be signed in to change notification settings - Fork 5.2k
feat: integrate deepgemm into EPMoE #5805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 9 commits
92d647c
e057acb
19ec50e
3ce1a91
3d51a71
c80fc3c
af94a8b
2022070
55ea483
988a522
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -20,8 +20,10 @@ | |||
| get_tensor_model_parallel_world_size, | ||||
| ) | ||||
| from sglang.srt.layers.moe.ep_moe.kernels import ( | ||||
| deepgemm_post_reorder_triton_kernel, | ||||
| gelu_and_mul_triton_kernel, | ||||
| grouped_gemm_triton, | ||||
| moe_ep_deepgemm_preproess, | ||||
| post_reorder_triton_kernel, | ||||
| pre_reorder_triton_kernel, | ||||
| run_moe_ep_preproess, | ||||
|
|
@@ -38,7 +40,13 @@ | |||
| from sglang.srt.layers.quantization.fp8 import Fp8Config, Fp8MoEMethod | ||||
| from sglang.srt.layers.quantization.fp8_kernel import scaled_fp8_quant | ||||
| from sglang.srt.model_executor.forward_batch_info import ForwardMode | ||||
| from sglang.srt.utils import DeepEPMode, is_hip, set_weight_attrs | ||||
| from sglang.srt.utils import ( | ||||
| DeepEPMode, | ||||
| get_bool_env_var, | ||||
| is_cuda, | ||||
| is_hip, | ||||
| set_weight_attrs, | ||||
| ) | ||||
|
|
||||
| _is_hip = is_hip() | ||||
|
|
||||
|
|
@@ -47,6 +55,8 @@ | |||
|
|
||||
| logger = logging.getLogger(__name__) | ||||
|
|
||||
| epmoe_use_deepgemm = get_bool_env_var("EPMOE_USE_DEEPGEMM") | ||||
|
||||
| _ENABLE_JIT_DEEPGEMM = True |
We might import it directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do you mean we just replace EPMOE_USE_DEEPGEMM with _ENABLE_JIT_DEEPGEMM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, enabling _ENABLE_JIT_DEEPGEMM will set deepgemm at epmoe as the default configuration.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why disable EPMOE DeepGEMM when use_deep_gemm is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe forward_deepgemm is called when use_deep_gemm is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any cases where Triton GEMM in forward_normal outperforms DeepGEMM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for now, I didn't find any case where Triton GEMM in forward_normal outperforms DeepGEMM, but DeepGEMM may occupy more GPU memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could remove epmoe_use_deepgemm and corresponding Environment variable EPMOE_USE_DEEPGEMM for the sake of clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does the variable num start from 2**2=4