Skip to content

Speed up when having padding tokens in DeepEP#6175

Merged
zhyncs merged 28 commits intosgl-project:mainfrom
fzyzcjy:feat/padding_moe
May 17, 2025
Merged

Speed up when having padding tokens in DeepEP#6175
zhyncs merged 28 commits intosgl-project:mainfrom
fzyzcjy:feat/padding_moe

Conversation

@fzyzcjy
Copy link
Copy Markdown
Collaborator

@fzyzcjy fzyzcjy commented May 10, 2025

Motivation

test

PYTHONUNBUFFERED=1 SGLANG_TORCH_PROFILER_DIR=/host_home/temp_sglang_server2local python3 -m sglang.launch_server --model-path /dev/shm/DeepSeek-R1 --trust-remote-code --dist-init-addr 192.168.0.55:5757 --nnodes 2 --node-rank ${MY_NODE_RANK} --tp-size ${num_gpu} --dp-size ${num_gpu} --enable-dp-attention --mem-fraction-static 0.8 --chunked-prefill-size $((128*${num_gpu})) --max-running-requests $((${num_gpu}*128)) --context-length 4096 --disable-radix-cache --enable-deepep-moe --deepep-mode low_latency --cuda-graph-bs 128 --decode-log-interval 1

python3 -m sglang.bench_one_batch_server --model-path /dev/shm/DeepSeek-R1 --base-url http://localhost:30000 --batch-size 16 --input-len 1 --output-len 2048 --skip-warmup
  • baseline: 6 tok/s/gpu
  • PR: 29 tok/s/gpu

Modifications

Checklist

@fzyzcjy fzyzcjy marked this pull request as draft May 12, 2025 00:04
@fzyzcjy fzyzcjy force-pushed the feat/padding_moe branch from 8797942 to 3fecc76 Compare May 12, 2025 00:09
@fzyzcjy fzyzcjy marked this pull request as ready for review May 12, 2025 00:09
@zhyncs zhyncs merged commit 2716830 into sgl-project:main May 17, 2025
113 of 128 checks passed
@lambert0312
Copy link
Copy Markdown
Contributor

This pr will significantly reduce DeepSeek's inference performance (15%+). Need to look at the specific reasons.

@fzyzcjy
Copy link
Copy Markdown
Collaborator Author

fzyzcjy commented May 20, 2025

@lambert0312 Looks bad. Could you please show your commands, and would be great to have a profile. My first guess is that, we need to fuse it.

@lambert0312
Copy link
Copy Markdown
Contributor

@lambert0312 Looks bad. Could you please show your commands, and would be great to have a profile. My first guess is that, we need to fuse it.

@fzyzcjy I tried to modify it. You can see the PR I linked above. Thank you.

@fzyzcjy
Copy link
Copy Markdown
Collaborator Author

fzyzcjy commented May 21, 2025

Interesting, I thought this line already makes no extra kernels are executed.

image

Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025
xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants