Skip to content

fix: blockscale moe routine supports non-DS routing#2476

Merged
bkryu merged 2 commits intoflashinfer-ai:mainfrom
hypdeb:patch-1
Feb 3, 2026
Merged

fix: blockscale moe routine supports non-DS routing#2476
bkryu merged 2 commits intoflashinfer-ai:mainfrom
hypdeb:patch-1

Conversation

@hypdeb
Copy link
Contributor

@hypdeb hypdeb commented Feb 3, 2026

📌 Description

The comment at this location previously stated that only float32 routing logits would be possible for blockscale MoE. However, it seems the launcher now supports bfloat16 for non-DS routing.

Summary by CodeRabbit

Bug Fixes

  • Per-method routing precision handling updated: the model now selects routing tensor precision based on the chosen routing method (e.g., DeepSeekV3 uses higher-precision), replacing fixed per-path precision. This improves routing stability and consistency across routing configurations without changing public APIs.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hypdeb, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an inaccuracy in the data type handling for Mixture-of-Experts (MoE) routing logits within the fp8_block_scale kernel. The change ensures that the benchmark routine correctly aligns with the launcher's current capabilities, allowing for bfloat16 usage in specific routing scenarios, thereby improving the fidelity of the benchmark to the actual runtime behavior.

Highlights

  • Blockscale MoE Routing Logits Data Type Correction: Updated the create_trtllm_moe_test_data function to accurately reflect the required data types for routing logits in fp8_block_scale MoE kernels. Previously, it was assumed that float32 was always necessary, but the routine now correctly uses bfloat16 for non-DeepSeekV3 routing methods, while DeepSeekV3 continues to utilize float32.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • benchmarks/routines/moe.py
    • Modified the create_trtllm_moe_test_data function to introduce conditional logic for the routing_logits data type when moe_kernel_type is fp8_block_scale.
    • Implemented a check to set routing_logits to float32 if routing_method_type is 2 (DeepSeekV3), and to bfloat16 otherwise.
Activity
  • No human activity (comments, reviews, etc.) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

Adjusts MOE benchmark test-data generation to choose routing_logits dtype based on routing_method_type: for fp8 block-scale DeepSeekV3 (type 2) uses float32 and others use bfloat16; for fp8 per-tensor the dtype logic is inverted for DeepSeek type (uses bfloat16 for type 3 else float32).

Changes

Cohort / File(s) Summary
MOE Test Data Generation
benchmarks/routines/moe.py
Replaces hard-coded routing_logits dtypes with conditional selection based on routing_method_type across FP8 block-scale and FP8 per-tensor paths (DeepSeek-specific branches now set routing dtype explicitly).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • Anerudhan
  • cyx-6
  • jiahanc

Poem

🐰 I hopped through tensors, soft and light,
I nudged the dtypes till they felt right.
DeepSeek gets float, others bfloat in view,
A tiny tweak — a rabbit's cue. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The description explains the change but lacks details on testing and pre-commit checks. It provides the motivation (launcher now supports bfloat16 for non-DS routing) but omits required checklist items. Add sections for test updates, test results, and complete the pre-commit checks checklist to fully meet the repository's PR template requirements.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: blockscale MoE now supports non-DeepSeek routing with flexible dtype handling based on routing method type.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly updates the data type for routing logits in the fp8_block_scale MoE routine to support non-DeepSeek routing methods with bfloat16. I've suggested a minor refactoring to improve code clarity and reduce duplication within the modified block. Additionally, this change makes the logic for fp8_block_scale identical to the existing logic for fp4_block_scale. You might consider refactoring these two blocks together in a follow-up to further improve maintainability.

Copy link
Collaborator

@bkryu bkryu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hypdeb for the quick fix. Left one comment Gemini's comment. Otherwise, this should be good to go

Copy link
Collaborator

@bkryu bkryu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bkryu bkryu merged commit 9e069e7 into flashinfer-ai:main Feb 3, 2026
20 checks passed
raayandhar pushed a commit to raayandhar/flashinfer that referenced this pull request Feb 5, 2026
…nfer-ai#2476)

<!-- .github/pull_request_template.md -->

## 📌 Description

The comment at this location previously stated that only `float32`
routing logits would be possible for blockscale MoE. However, it seems
the launcher now supports bfloat16 for non-DS routing.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Bug Fixes
* Per-method routing precision handling updated: the model now selects
routing tensor precision based on the chosen routing method (e.g.,
DeepSeekV3 uses higher-precision), replacing fixed per-path precision.
This improves routing stability and consistency across routing
configurations without changing public APIs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@hypdeb hypdeb deleted the patch-1 branch March 3, 2026 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants