Add test case for Qwen3N by samuellees · Pull Request #2532 · flashinfer-ai/flashinfer

samuellees · 2026-02-10T02:57:15Z

📌 Description

Add test case for Qwen3N, and Qwen3.5 according to vllm-project/vllm#34131

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Tests
- Expanded test coverage by adding additional head-configuration cases across multiple test scenarios to improve reliability and catch more edge cases.
- No changes to test logic or public interfaces; only parameterized inputs were extended.

gemini-code-assist · 2026-02-10T02:57:30Z

Summary of Changes

Hello @samuellees, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the testing suite for the prefill kernel by incorporating additional attention head configurations. These new test cases are crucial for validating the kernel's compatibility and performance with advanced model architectures, such as Qwen3N and Qwen3.5, thereby improving the overall robustness and reliability of the system.

Highlights

Expanded Test Coverage: Added new num_q_heads, num_k_heads, num_v_heads configurations, specifically (16, 16, 32) and (16, 16, 64), to the _test_prefill_kernel and test_prefill_kernel_basic functions.
Qwen Model Support: Introduced test cases relevant to Qwen3N and Qwen3.5 models by including their characteristic attention head configurations.
Prefill Kernel Concatenation Tests: Extended the test_prefill_kernel_concat function with the new attention head configurations to ensure proper behavior during sequence concatenation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

tests/gdn/test_prefill_delta_rule.py
- Expanded the num_q_heads, num_k_heads, num_v_heads parameter sets for _test_prefill_kernel and test_prefill_kernel_basic to include (16, 16, 32) and (16, 16, 64).
- Added (16, 16, 32) and (16, 16, 64) to the num_q_heads, num_k_heads, num_v_heads parameter set for test_prefill_kernel_concat.

Activity

No human activity has occurred on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-10T02:57:45Z

📝 Walkthrough

Walkthrough

Added two larger head-size parameter tuples—(16, 16, 32) and (16, 16, 64)—to parameterization decorators in prefill delta rule tests, expanding test coverage without changing test logic or public interfaces.

Changes

Cohort / File(s)	Summary
Test Parameter Expansion `tests/gdn/test_prefill_delta_rule.py`	Added head configuration tuples `(16, 16, 32)` and `(16, 16, 64)` to multiple pytest parametrize decorators for prefill-related tests. No code logic, assertions, or public signatures changed.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 New heads hop in, a joyful spree,
Two sizes more to test with glee,
(16,16,32) and (16,16,64) in view,
I nibble coverage, now broader and true,
Hooray for tests — a carrot or two! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title mentions 'Qwen3N' but the actual changes only expand parametric test configurations (adding (16,16,32) and (16,16,64) head configurations) with no Qwen3N-specific implementation.	Update the title to accurately reflect the primary change: 'Expand parametric test configurations for prefill kernel' or similar.
Description check	⚠️ Warning	The description mentions adding test cases for Qwen3N and Qwen3.5, but the actual code changes only expand existing test parametrization with additional head configurations; no new Qwen3N/3.5 test cases are present.	Clarify the description to match actual changes: either update the code to add Qwen3N/3.5 test cases as described, or revise the description to reflect the parametrization expansion.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments

tests/gdn/test_prefill_delta_rule.py (1)

411-412: Consider the CI impact of the expanded test matrix.

The chunked prefill test doubles its head-config count (2 → 4). Combined with the other two tests, total parameterized cases grow substantially. Given the CI pipeline is already failing (10/20 jobs), it may be worth confirming whether the failures are related to these larger configs (e.g., GPU memory or timeouts) or are pre-existing on main.

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request adds new test cases for Grouped-Value Attention (GVA) configurations, likely for Qwen3N and Qwen3.5 models. The changes correctly expand test coverage. I've included one suggestion to refactor duplicated test parameters for improved maintainability.

gemini-code-assist · 2026-02-10T02:58:38Z

tests/gdn/test_prefill_delta_rule.py

 @pytest.mark.parametrize(
    "num_q_heads, num_k_heads, num_v_heads",
-    [(1, 1, 1), (4, 1, 1), (3, 3, 3), (6, 2, 2), (1, 1, 2), (2, 2, 4)],
+    [
+        (1, 1, 1),
+        (4, 1, 1),
+        (3, 3, 3),
+        (6, 2, 2),
+        (1, 1, 2),
+        (2, 2, 4),
+        (16, 16, 32),
+        (16, 16, 64),
+    ],
 )


This list of head configurations is identical to the one used for test_prefill_kernel_basic (lines 137-149). To improve maintainability and reduce duplication, consider defining this list as a module-level constant and reusing it in both test functions. For example:

# At module level _PREFILL_HEAD_CONFIGS = [ (1, 1, 1), (4, 1, 1), (3, 3, 3), (6, 2, 2), (1, 1, 2), (2, 2, 4), (16, 16, 32), (16, 16, 64), ] # In test decorators @pytest.mark.parametrize( "num_q_heads, num_k_heads, num_v_heads", _PREFILL_HEAD_CONFIGS )

yzh119 · 2026-02-10T11:17:03Z

/bot run

yzh119 · 2026-02-10T11:17:24Z

@flashinfer-bot run

flashinfer-bot · 2026-02-10T11:17:39Z

GitLab MR !306 has been created, and the CI pipeline #43685351 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-02-11T01:56:56Z

[FAILED] Pipeline #43685351: 10/20 passed

yongwww · 2026-02-11T17:00:03Z

@samuellees please rebase the PR onto the latest main to kick off CI.

Add test case for Qwen3N

83e5a84

gemini-code-assist bot reviewed Feb 10, 2026

View reviewed changes

flashinfer-bot added the run-ci label Feb 10, 2026

yzh119 approved these changes Feb 10, 2026

View reviewed changes

Merge branch 'flashinfer-ai:main' into gdn-perfill-testcase

5adc82f

yzh119 merged commit a003c02 into flashinfer-ai:main Feb 16, 2026
27 of 28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test case for Qwen3N#2532

Add test case for Qwen3N#2532
yzh119 merged 2 commits intoflashinfer-ai:mainfrom
samuellees:gdn-perfill-testcase

samuellees commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Uh oh!

yzh119 commented Feb 10, 2026

Uh oh!

yzh119 commented Feb 10, 2026

Uh oh!

flashinfer-bot commented Feb 10, 2026

Uh oh!

flashinfer-bot commented Feb 11, 2026

Uh oh!

yongwww commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

samuellees commented Feb 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

yzh119 commented Feb 10, 2026

Uh oh!

yzh119 commented Feb 10, 2026

Uh oh!

flashinfer-bot commented Feb 10, 2026

Uh oh!

flashinfer-bot commented Feb 11, 2026

Uh oh!

yongwww commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

samuellees commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 10, 2026 •

edited

Loading