Skip to content

Add test case for Qwen3N#2532

Merged
yzh119 merged 2 commits intoflashinfer-ai:mainfrom
samuellees:gdn-perfill-testcase
Feb 16, 2026
Merged

Add test case for Qwen3N#2532
yzh119 merged 2 commits intoflashinfer-ai:mainfrom
samuellees:gdn-perfill-testcase

Conversation

@samuellees
Copy link
Contributor

@samuellees samuellees commented Feb 10, 2026

📌 Description

Add test case for Qwen3N, and Qwen3.5 according to vllm-project/vllm#34131

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Tests
    • Expanded test coverage by adding additional head-configuration cases across multiple test scenarios to improve reliability and catch more edge cases.
    • No changes to test logic or public interfaces; only parameterized inputs were extended.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @samuellees, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the testing suite for the prefill kernel by incorporating additional attention head configurations. These new test cases are crucial for validating the kernel's compatibility and performance with advanced model architectures, such as Qwen3N and Qwen3.5, thereby improving the overall robustness and reliability of the system.

Highlights

  • Expanded Test Coverage: Added new num_q_heads, num_k_heads, num_v_heads configurations, specifically (16, 16, 32) and (16, 16, 64), to the _test_prefill_kernel and test_prefill_kernel_basic functions.
  • Qwen Model Support: Introduced test cases relevant to Qwen3N and Qwen3.5 models by including their characteristic attention head configurations.
  • Prefill Kernel Concatenation Tests: Extended the test_prefill_kernel_concat function with the new attention head configurations to ensure proper behavior during sequence concatenation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • tests/gdn/test_prefill_delta_rule.py
    • Expanded the num_q_heads, num_k_heads, num_v_heads parameter sets for _test_prefill_kernel and test_prefill_kernel_basic to include (16, 16, 32) and (16, 16, 64).
    • Added (16, 16, 32) and (16, 16, 64) to the num_q_heads, num_k_heads, num_v_heads parameter set for test_prefill_kernel_concat.
Activity
  • No human activity has occurred on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 10, 2026

📝 Walkthrough

Walkthrough

Added two larger head-size parameter tuples—(16, 16, 32) and (16, 16, 64)—to parameterization decorators in prefill delta rule tests, expanding test coverage without changing test logic or public interfaces.

Changes

Cohort / File(s) Summary
Test Parameter Expansion
tests/gdn/test_prefill_delta_rule.py
Added head configuration tuples (16, 16, 32) and (16, 16, 64) to multiple pytest parametrize decorators for prefill-related tests. No code logic, assertions, or public signatures changed.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 New heads hop in, a joyful spree,
Two sizes more to test with glee,
(16,16,32) and (16,16,64) in view,
I nibble coverage, now broader and true,
Hooray for tests — a carrot or two! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions 'Qwen3N' but the actual changes only expand parametric test configurations (adding (16,16,32) and (16,16,64) head configurations) with no Qwen3N-specific implementation. Update the title to accurately reflect the primary change: 'Expand parametric test configurations for prefill kernel' or similar.
Description check ⚠️ Warning The description mentions adding test cases for Qwen3N and Qwen3.5, but the actual code changes only expand existing test parametrization with additional head configurations; no new Qwen3N/3.5 test cases are present. Clarify the description to match actual changes: either update the code to add Qwen3N/3.5 test cases as described, or revise the description to reflect the parametrization expansion.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
tests/gdn/test_prefill_delta_rule.py (1)

411-412: Consider the CI impact of the expanded test matrix.

The chunked prefill test doubles its head-config count (2 → 4). Combined with the other two tests, total parameterized cases grow substantially. Given the CI pipeline is already failing (10/20 jobs), it may be worth confirming whether the failures are related to these larger configs (e.g., GPU memory or timeouts) or are pre-existing on main.

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds new test cases for Grouped-Value Attention (GVA) configurations, likely for Qwen3N and Qwen3.5 models. The changes correctly expand test coverage. I've included one suggestion to refactor duplicated test parameters for improved maintainability.

Comment on lines 188 to 200
@pytest.mark.parametrize(
"num_q_heads, num_k_heads, num_v_heads",
[(1, 1, 1), (4, 1, 1), (3, 3, 3), (6, 2, 2), (1, 1, 2), (2, 2, 4)],
[
(1, 1, 1),
(4, 1, 1),
(3, 3, 3),
(6, 2, 2),
(1, 1, 2),
(2, 2, 4),
(16, 16, 32),
(16, 16, 64),
],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This list of head configurations is identical to the one used for test_prefill_kernel_basic (lines 137-149). To improve maintainability and reduce duplication, consider defining this list as a module-level constant and reusing it in both test functions. For example:

# At module level
_PREFILL_HEAD_CONFIGS = [
    (1, 1, 1),
    (4, 1, 1),
    (3, 3, 3),
    (6, 2, 2),
    (1, 1, 2),
    (2, 2, 4),
    (16, 16, 32),
    (16, 16, 64),
]

# In test decorators
@pytest.mark.parametrize(
    "num_q_heads, num_k_heads, num_v_heads", _PREFILL_HEAD_CONFIGS
)

@yzh119
Copy link
Collaborator

yzh119 commented Feb 10, 2026

/bot run

@yzh119
Copy link
Collaborator

yzh119 commented Feb 10, 2026

@flashinfer-bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !306 has been created, and the CI pipeline #43685351 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Collaborator

[FAILED] Pipeline #43685351: 10/20 passed

@yongwww
Copy link
Member

yongwww commented Feb 11, 2026

@samuellees please rebase the PR onto the latest main to kick off CI.

@yzh119 yzh119 merged commit a003c02 into flashinfer-ai:main Feb 16, 2026
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants