Fix topk inference performance reduce by lambert0312 · Pull Request #6474 · sgl-project/sglang

lambert0312 · 2025-05-21T00:47:31Z

Motivation

When the following logic is added to topk.py, the inference performance will be significantly affected:

sglang/python/sglang/srt/layers/moe/topk.py

Lines 267 to 269 in 6632489

    
           torch.compile( 
        
               _mask_topk_ids_padded_region, dynamic=True, backend=get_compiler_backend() 
        
           )(topk_ids, num_token_non_padded)

Run command:

python3 -m sglang.launch_server --model-path /path/to/DeepSeek-V3-0324 --trust-remote-code --host 0.0.0.0 --port 30000 --attention-backend flashinfer --n-share-experts-fusion 16 --tp 16 --dist-init-addr IP:20000 --nnodes 2 --node-rank 0

Ref: #6175

Modifications

Add num_token_non_padded judgment logic. If it is None, directly return the previous result.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

fix topk inference performance reduce

fda3745

lambert0312 requested review from BBuf, HaiShaw, Ying1123, ch-wan, ispobock, merrymercy and zhyncs as code owners May 21, 2025 00:47

Merge branch 'main' into fix_topk_inference_performance_reduce

eaa2c42

zhyncs approved these changes May 23, 2025

View reviewed changes

zhyncs merged commit 2f42749 into sgl-project:main May 23, 2025
0 of 37 checks passed

Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025

Fix topk inference performance reduce (sgl-project#6474)

a4adace

xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025

Fix topk inference performance reduce (sgl-project#6474)

d278b57

lambert0312 deleted the fix_topk_inference_performance_reduce branch June 20, 2025 07:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix topk inference performance reduce#6474

Fix topk inference performance reduce#6474
zhyncs merged 2 commits intosgl-project:mainfrom
lambert0312:fix_topk_inference_performance_reduce

lambert0312 commented May 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	torch.compile(
	_mask_topk_ids_padded_region, dynamic=True, backend=get_compiler_backend()
	)(topk_ids, num_token_non_padded)

Conversation

lambert0312 commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lambert0312 commented May 21, 2025 •

edited

Loading