tests: add bias testing to nvfp4 moe by jimmyzho · Pull Request #2585 · flashinfer-ai/flashinfer

jimmyzho · 2026-02-18T23:22:59Z

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Tests
- Added a test exercising GEMM bias combinations for low-precision (FP4) mixture-of-experts (bias on GEMM1, GEMM2, or both).
- Extended the test harness to validate GEMM biases across production, CUDA-graph, and reference/dequant paths.
Refactor
- Extended argument/config surfaces to accept and propagate GEMM biases through all runtime and reference paths, ensuring biases are applied where relevant.

remove sparse

gemini-code-assist · 2026-02-18T23:23:14Z

Summary of Changes

Hello @jimmyzho, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the NVFP4 Mixture of Experts (MoE) implementation by adding comprehensive bias support. The changes involve modifying the argument structures to accommodate bias terms for both GEMM operations within the MoE layer and updating the dequantization and reference computation logic to correctly apply these biases. The primary impact is the introduction of new test cases that verify the accuracy of the NVFP4 MoE kernel when biases are present, ensuring the robustness and correctness of the MoE functionality under these conditions.

Highlights

Bias Parameter Integration: Added gemm1_bias and gemm2_bias parameters to the moe_args and moe_args_dequant classes to support bias terms in Mixture of Experts (MoE) operations.
Bias Application Logic: Implemented conditional logic within the run_moe_dequant function to apply gemm1_bias and gemm2_bias during the respective GEMM operations if they are provided.
Reference Function Updates: Updated several reference functions (run_moe_reference_fp4, dequant_reference_dsfp8, run_moe_reference_per_tensor_scale_fp8, run_moe_reference_bf16, dequantize) to correctly pass the new bias parameters to the dequantization process.
New Bias Test Cases: Introduced a new helper function _run_fp4_moe_with_bias and three new parameterized test functions (test_nvfp4_moe_gemm2_bias, test_nvfp4_moe_gemm1_bias, test_nvfp4_moe_both_biases) to thoroughly validate the bias functionality for NVFP4 MoE layers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

tests/moe/test_trtllm_gen_fused_moe.py
- Added gemm1_bias and gemm2_bias parameters to moe_args and moe_args_dequant constructors.
- Modified run_moe_dequant to apply gemm1_bias and gemm2_bias if provided.
- Updated run_moe_reference_fp4, dequant_reference_dsfp8, run_moe_reference_per_tensor_scale_fp8, run_moe_reference_bf16, and dequantize to pass bias parameters.
- Introduced _run_fp4_moe_with_bias helper function for bias testing.
- Added test_nvfp4_moe_gemm2_bias, test_nvfp4_moe_gemm1_bias, and test_nvfp4_moe_both_biases for comprehensive bias validation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-18T23:23:18Z

No actionable comments were generated in the recent review. 🎉

📝 Walkthrough

Walkthrough

Adds runtime-configurable GEMM bias support for FP4 MoE by introducing gemm1_bias and gemm2_bias, wiring them through argument containers, runtime/config propagation, CUDA-graph invocation, and reference/dequant paths; includes tests exercising bias combinations.

Changes

Cohort / File(s)	Summary
FP4 MoE bias wiring & tests `tests/moe/test_trtllm_gen_fused_moe.py`	Added public fields `gemm1_bias` and `gemm2_bias` to `moe_args` and `moe_args_dequant`; propagate biases through runtime config dict, CUDA-graph call sites, and all reference/dequant paths; apply biases after GEMM1 and GEMM2 when provided; added `test_nvfp4_moe_gemm_bias` to exercise gemm1/gemm2/both combos.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

op: moe

Suggested reviewers

aleozlx
bkryu
cyx-6
yzh119
jiahanc

Poem

🐰 I nudged two biases into the flow,
GEMM1, GEMM2 — now onward they go,
Tests hop past gates with joyful cheer,
Tiny changes, big results near! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 73.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description is mostly a template with empty sections; the Description and Reviewer Notes sections lack actual content about the changes.	Fill in the Description section with details about what bias testing was added and why, and optionally add any reviewer notes or concerns.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding bias testing to the NvFP4 MoE code path.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request adds test coverage for bias support in the nvfp4 MoE implementation. The changes are well-contained within the test file and correctly add gemm1_bias and gemm2_bias to the reference implementations and new tests. The new tests for gemm1_bias, gemm2_bias, and both biases together are comprehensive. I have one suggestion to improve the maintainability of the new test code by refactoring duplicated logic.

tests/moe/test_trtllm_gen_fused_moe.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tests/moe/test_trtllm_gen_fused_moe.py (2)
3195-3207: Set torch.random.manual_seed(0) before creating bias tensors for full reproducibility.

In all three test functions, torch.random.manual_seed(0) is called after the bias tensors are created via torch.randn(...). This means bias values depend on whatever RNG state was left by the prior parametrized test case. While the reference-vs-kernel comparison is still valid (both see the same bias), this makes individual test failures harder to reproduce in isolation.
Proposed fix (example for `test_nvfp4_moe_gemm2_bias`; apply analogously to the other two)
     num_experts, top_k = 8, 2
     device = "cuda"
 
+    torch.random.manual_seed(0)
     # gemm2_bias shape: [num_experts, hidden_size], dtype float32
     gemm2_bias = torch.randn(
         (num_experts, hidden_size), device=device, dtype=torch.float32
     )
 
-    torch.random.manual_seed(0)
     kernel_output, ref_output = _run_fp4_moe_with_bias(
Also applies to: 3234-3246, 3271-3287
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/moe/test_trtllm_gen_fused_moe.py` around lines 3195 - 3207, The bias
tensors (e.g., gemm2_bias) are created before seeding the RNG, so their values
vary with prior test RNG state; move the torch.random.manual_seed(0) call to
immediately before creating each bias tensor (i.e., call
torch.random.manual_seed(0) before the torch.randn(...) that produces gemm2_bias
in test_nvfp4_moe_gemm2_bias and the two analogous test functions) so that the
bias is deterministically reproducible while leaving the subsequent calls to
_run_fp4_moe_with_bias unchanged.
3138-3138: Inconsistent weight_processing dict key: "shuffle" instead of "use_shuffled_weight".

All other call sites (e.g., run_moe_test at Line 2545, FP8BlockScaleMoe.prepare_static_weights_for_kernel at Line 946) use "use_shuffled_weight" as the key. FP4Moe.prepare_static_weights_for_kernel happens to ignore the weight_processing parameter entirely, so this doesn't cause a runtime failure today, but it would silently break if FP4 ever starts using that dict.
Proposed fix
-        {"shuffle": True, "layout": WeightLayout.MajorK},
+        {"use_shuffled_weight": True, "layout": WeightLayout.MajorK},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/moe/test_trtllm_gen_fused_moe.py` at line 3138, The weight_processing
dict in the test uses the wrong key "shuffle" — change that entry to
"use_shuffled_weight": True so it matches other call sites (see run_moe_test
usage and FP8BlockScaleMoe.prepare_static_weights_for_kernel /
FP4Moe.prepare_static_weights_for_kernel expectations); update the dict in the
test_trtllm_gen_fused_moe test case where {"shuffle": True, "layout":
WeightLayout.MajorK} appears to {"use_shuffled_weight": True, "layout":
WeightLayout.MajorK} to ensure consistent behavior if FP4 starts honoring the
parameter.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/moe/test_trtllm_gen_fused_moe.py`:
- Around line 3172-3174: The test passes inconsistent types for
routing_method_type and activation_type: the target function signature expects
ints (routing_method_type: int = 0, activation_type: int =
ActivationType.Swiglu.value) but some call sites supply enum objects (e.g.,
self.config["activation_type"]) while others use .value; update all callers to
pass the enum's integer value (use .value) consistently—e.g., change usages of
self.config["activation_type"] or other enum instances to
self.config["activation_type"].value (and likewise for routing_method_type) so
every call to the function uses integer values matching the signature.

---

Nitpick comments:
In `@tests/moe/test_trtllm_gen_fused_moe.py`:
- Around line 3195-3207: The bias tensors (e.g., gemm2_bias) are created before
seeding the RNG, so their values vary with prior test RNG state; move the
torch.random.manual_seed(0) call to immediately before creating each bias tensor
(i.e., call torch.random.manual_seed(0) before the torch.randn(...) that
produces gemm2_bias in test_nvfp4_moe_gemm2_bias and the two analogous test
functions) so that the bias is deterministically reproducible while leaving the
subsequent calls to _run_fp4_moe_with_bias unchanged.
- Line 3138: The weight_processing dict in the test uses the wrong key "shuffle"
— change that entry to "use_shuffled_weight": True so it matches other call
sites (see run_moe_test usage and
FP8BlockScaleMoe.prepare_static_weights_for_kernel /
FP4Moe.prepare_static_weights_for_kernel expectations); update the dict in the
test_trtllm_gen_fused_moe test case where {"shuffle": True, "layout":
WeightLayout.MajorK} appears to {"use_shuffled_weight": True, "layout":
WeightLayout.MajorK} to ensure consistent behavior if FP4 starts honoring the
parameter.

tests/moe/test_trtllm_gen_fused_moe.py

aleozlx

looks good to me

bot comments are reasonable to address, pls take a look

coderabbitai

🧹 Nitpick comments (2)

tests/moe/test_trtllm_gen_fused_moe.py (2)
571-572: Consider using .get() for backward-compatible kwargs access.

kwargs["gemm1_bias"] / kwargs["gemm2_bias"] will raise KeyError if any future caller of call_moe omits these. Using .get("gemm1_bias", None) is consistent with how enable_autotune is already handled in this same method.
♻️ Proposed fix
-        gemm1_bias = kwargs["gemm1_bias"]
-        gemm2_bias = kwargs["gemm2_bias"]
+        gemm1_bias = kwargs.get("gemm1_bias", None)
+        gemm2_bias = kwargs.get("gemm2_bias", None)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/moe/test_trtllm_gen_fused_moe.py` around lines 571 - 572, In call_moe,
replace direct dict indexing for gemm1_bias and gemm2_bias with safe retrieval
using kwargs.get("gemm1_bias", None) and kwargs.get("gemm2_bias", None) (same
pattern used for enable_autotune) so missing callers won't raise KeyError;
update the references to gemm1_bias and gemm2_bias in that function accordingly.
2186-2187: Bias propagation added to non-FP4 reference paths without corresponding production support.

run_moe_reference_dsfp8, run_moe_reference_bf16, run_moe_reference_per_tensor_scale_fp8, and run_moe_reference_mxint4 now forward gemm1_bias/gemm2_bias into run_moe_dequant, but their production counterparts (trtllm_fp8_block_scale_moe, trtllm_bf16_moe, etc.) do not accept or apply biases. Any future test that passes non-None biases with these quant modes will silently mismatch between reference and production outputs. Consider adding an assertion in those reference functions that biases are None if production doesn't support them, e.g.:
assert args.gemm1_bias is None and args.gemm2_bias is None, \
    "GEMM bias not supported for FP8/BF16/MxInt4 production kernels"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/moe/test_trtllm_gen_fused_moe.py` around lines 2186 - 2187, The
reference functions run_moe_reference_dsfp8, run_moe_reference_bf16,
run_moe_reference_per_tensor_scale_fp8, and run_moe_reference_mxint4 are
forwarding gemm1_bias/gemm2_bias into run_moe_dequant while their production
counterparts (trtllm_fp8_block_scale_moe, trtllm_bf16_moe, etc.) do not support
biases; add an assertion at the start of each of those reference functions
(before calling run_moe_dequant) that args.gemm1_bias is None and
args.gemm2_bias is None with a clear message like "GEMM bias not supported for
FP8/BF16/MxInt4 production kernels" so tests fail-fast when non-None biases are
passed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/moe/test_trtllm_gen_fused_moe.py`:
- Around line 571-572: In call_moe, replace direct dict indexing for gemm1_bias
and gemm2_bias with safe retrieval using kwargs.get("gemm1_bias", None) and
kwargs.get("gemm2_bias", None) (same pattern used for enable_autotune) so
missing callers won't raise KeyError; update the references to gemm1_bias and
gemm2_bias in that function accordingly.
- Around line 2186-2187: The reference functions run_moe_reference_dsfp8,
run_moe_reference_bf16, run_moe_reference_per_tensor_scale_fp8, and
run_moe_reference_mxint4 are forwarding gemm1_bias/gemm2_bias into
run_moe_dequant while their production counterparts (trtllm_fp8_block_scale_moe,
trtllm_bf16_moe, etc.) do not support biases; add an assertion at the start of
each of those reference functions (before calling run_moe_dequant) that
args.gemm1_bias is None and args.gemm2_bias is None with a clear message like
"GEMM bias not supported for FP8/BF16/MxInt4 production kernels" so tests
fail-fast when non-None biases are passed.

jimmyzho · 2026-02-19T23:03:05Z

@aleozlx Just refactored the test and now it is directly calling run_moe_test, could you pls take another look?

aleozlx

lgtm

aleozlx · 2026-02-20T19:34:16Z

/bot run

flashinfer-bot · 2026-02-20T19:35:17Z

GitLab MR !334 has been created, and the CI pipeline #44471261 is currently running. I'll report back once the pipeline job completes.

cicd

flashinfer-bot · 2026-02-20T23:39:40Z

[FAILED] Pipeline #44471261: 14/20 passed

jimmyzho added 2 commits February 18, 2026 02:24

bias unit test

04168c7

more test params

47629c1

remove sparse

gemini-code-assist bot reviewed Feb 18, 2026

View reviewed changes

tests/moe/test_trtllm_gen_fused_moe.py Outdated Show resolved Hide resolved

jimmyzho mentioned this pull request Feb 18, 2026

update flashinfer api for nvfp4 moe with bias #2437

Closed

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

tests/moe/test_trtllm_gen_fused_moe.py Outdated Show resolved Hide resolved

aleozlx reviewed Feb 19, 2026

View reviewed changes

refactor tests

a132123

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

jimmyzho requested a review from aleozlx February 19, 2026 23:02

aleozlx approved these changes Feb 20, 2026

View reviewed changes

aleozlx added the run-ci label Feb 20, 2026

style

0147e3b

cicd

jimmyzho force-pushed the bias branch from 0cf8886 to 0147e3b Compare February 20, 2026 21:44

jimmyzho removed the run-ci label Feb 20, 2026

Trigger CI

23b11a0

jimmyzho enabled auto-merge (squash) February 20, 2026 23:43

jimmyzho merged commit 3000467 into flashinfer-ai:main Feb 20, 2026
18 checks passed

coderabbitai bot mentioned this pull request Feb 24, 2026

fix: trtllm_mxint4_block_scale_moe unit test to index output list #2627

Merged

5 tasks

Conversation

jimmyzho commented Feb 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aleozlx left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

jimmyzho commented Feb 19, 2026

Uh oh!

aleozlx left a comment

Choose a reason for hiding this comment

Uh oh!

aleozlx commented Feb 20, 2026

Uh oh!

flashinfer-bot commented Feb 20, 2026

Uh oh!

flashinfer-bot commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jimmyzho commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading