Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion python/sglang/srt/operations_strategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,9 +127,9 @@ def _compute_moe_deepseek_blog_decode(layer):
layer.mlp.op_combine_a,
operations.YieldOperation(),
layer.mlp.op_combine_b,
operations.YieldOperation(),
layer.mlp.op_output,
layer.op_comm_postprocess_layer,
Comment on lines 129 to 132
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This change refines the operation staging for the DeepSeek decode path:

  • A YieldOperation is introduced after layer.mlp.op_combine_b.
  • The final YieldOperation after layer.op_comm_postprocess_layer is removed.

This effectively isolates op_combine_b into its own stage and groups op_output with op_comm_postprocess_layer in the new final stage. The total number of stages for decode operations increases from 5 to 6.

Could you provide more details on the specific kernels being overlapped and the expected performance benefits from this new staging? For instance, is op_combine_b (which involves deepep_dispatcher.combine_b) a communication-heavy step where yielding immediately after offers significant overlap opportunities with other batch processing?

Understanding the rationale will help in assessing the impact, especially since tbo_delta_stages remains 2.

operations.YieldOperation(),
],
)

Expand Down
Loading