Skip to content

tests: bmm_fp8 for SM110#2538

Merged
jimmyzho merged 1 commit intoflashinfer-ai:mainfrom
jimmyzho:thor-ut
Feb 12, 2026
Merged

tests: bmm_fp8 for SM110#2538
jimmyzho merged 1 commit intoflashinfer-ai:mainfrom
jimmyzho:thor-ut

Conversation

@jimmyzho
Copy link
Contributor

@jimmyzho jimmyzho commented Feb 11, 2026

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • New Features

    • Extended FP8 batch matrix-multiplication support to NVIDIA SM110 GPU architecture.
  • Tests

    • Added a test guard to skip FP8 e5m2 cases on SM110 with a note that SM110 FP8 is supported via the cutlass backend only (e5m2 not supported).

@jimmyzho jimmyzho self-assigned this Feb 11, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Extended SM 110 compute capability support for two FP8 BMM requirement decorators and added a test skip for SM110 when torch.float8_e5m2 is involved.

Changes

Cohort / File(s) Summary
FP8 BMM Compute Capability Decorators
flashinfer/gemm/gemm_base.py
Added compute capability 110 to the @supported_compute_capability lists for _cudnn_bmm_fp8_requirement and _cublas_bmm_fp8_requirement.
SM110 e5m2 Test Skip
tests/gemm/test_bmm_fp8.py
Added a runtime skip when running on SM110 if input or mat2 uses torch.float8_e5m2, with a message noting only cutlass supports SM110 e5m2.

Sequence Diagram(s)

(omitted — changes are metadata/test guard; no multi-component new control flow)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

op: gemm

Suggested reviewers

  • nvmbreughe
  • yongwww
  • cyx-6

Poem

🐰 I hopped into code, small and spry,
SM110 waved a curious eye,
I nudged two lists to welcome the new,
And skipped e5m2 where cutlass must do. ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely empty—it contains only the template structure with unchecked boxes and placeholder comments, providing no actual description of the changes, rationale, or related issues. Fill in the Description section with details about SM110 support additions, explain why these changes are needed, and complete or update the checklist items as appropriate.
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'tests: bmm_fp8 for SM110' directly and specifically summarizes the main change: adding test support for SM110 compute capability in the BMM FP8 test suite.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @jimmyzho, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request expands the compatibility of FP8 batch matrix multiplication operations within the FlashInfer library to support the SM110 compute capability. It ensures that the system correctly identifies and utilizes this new hardware capability while also refining the testing suite to account for specific FP8 format limitations on SM110.

Highlights

  • SM110 Compute Capability Support: Extended support for FP8 batch matrix multiplication (BMM) to include NVIDIA's SM110 compute capability in both cuDNN and cuBLAS requirements.
  • Test Updates for SM110: Introduced a new skip condition in the FP8 BMM tests to correctly handle the e5m2 FP8 format on SM110, as it is not supported by the Cutlass backend.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • flashinfer/gemm/gemm_base.py
    • Added SM110 to the list of supported compute capabilities for the _cudnn_bmm_fp8_requirement function.
    • Added SM110 to the list of supported compute capabilities for the _cublas_bmm_fp8_requirement function.
  • tests/gemm/test_bmm_fp8.py
    • Implemented a pytest.skip condition for SM110 when e5m2 FP8 format is used, as Cutlass does not support this combination.
Activity
  • No human activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for SM110 to the bmm_fp8 operation for cudnn and cublas backends. The changes look reasonable, but I've found some inconsistencies between the implementation and the tests.

Specifically, the test test_bmm_fp8 now skips all e5m2 tests on SM110, with a confusing message. This suggests that e5m2 is not supported on SM110 for any backend. However, the requirement functions for cudnn and cublas in gemm_base.py do not reflect this limitation.

I've added comments to:

  • Suggest clarifying the test skip message in tests/gemm/test_bmm_fp8.py.
  • Recommend updating the requirement functions in flashinfer/gemm/gemm_base.py to explicitly reject e5m2 on SM110 if it's indeed unsupported, which would make the API more robust.

Addressing these points will improve the clarity and correctness of the SM110 support.



@supported_compute_capability([89, 90, 100, 103, 120, 121])
@supported_compute_capability([89, 90, 100, 103, 110, 120, 121])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The corresponding test test_bmm_fp8 skips all tests on SM110 that use e5m2 dtypes. This suggests that the cudnn backend for bmm_fp8 might not support e5m2 on SM110. If this is the case, it would be better to add a check inside _cudnn_bmm_fp8_requirement to explicitly reject this combination and raise a ValueError, for consistency with _cutlass_bmm_fp8_requirement. This would make the API more robust. For example:

from ..utils import get_compute_capability

# ... inside _cudnn_bmm_fp8_requirement
major, _ = get_compute_capability(A.device)
if major == 11 and (A.dtype == torch.float8_e5m2 or B.dtype == torch.float8_e5m2):
    raise ValueError("bmm_fp8 with e5m2 is not supported on SM110 for cudnn backend")



@supported_compute_capability([89, 90, 100, 103, 120, 121])
@supported_compute_capability([89, 90, 100, 103, 110, 120, 121])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the cudnn backend, the test test_bmm_fp8 skips all tests on SM110 with e5m2 dtypes. If the cublas backend for bmm_fp8 also doesn't support e5m2 on SM110, this requirement function should be updated to raise a ValueError for this combination. This would improve API robustness and clarity. For example:

from ..utils import get_compute_capability

# ... inside _cublas_bmm_fp8_requirement
major, _ = get_compute_capability(A.device)
if major == 11 and (A.dtype == torch.float8_e5m2 or B.dtype == torch.float8_e5m2):
    raise ValueError("bmm_fp8 with e5m2 is not supported on SM110 for cublas backend")

Comment on lines +32 to +37
if compute_capability[0] == 11 and (
input_dtype == torch.float8_e5m2 or mat2_dtype == torch.float8_e5m2
):
pytest.skip(
"Invalid combination: only cutlass supports SM110 which does not support e5m2"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The skip message is a bit confusing. It says "only cutlass supports SM110", but this pull request appears to add SM110 support for cublas and cudnn backends as well. If the intention is that none of the backends support e5m2 on SM110, a clearer message would be helpful to avoid confusion.

Suggested change
if compute_capability[0] == 11 and (
input_dtype == torch.float8_e5m2 or mat2_dtype == torch.float8_e5m2
):
pytest.skip(
"Invalid combination: only cutlass supports SM110 which does not support e5m2"
)
if compute_capability[0] == 11 and (
input_dtype == torch.float8_e5m2 or mat2_dtype == torch.float8_e5m2
):
pytest.skip(
"e5m2 is not supported on SM110 for bmm_fp8 by any of the available backends."
)

@bkryu
Copy link
Collaborator

bkryu commented Feb 11, 2026

/bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !308 has been created, and the CI pipeline #43797611 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Collaborator

[CANCELING] Pipeline #43797611: canceled

@jimmyzho jimmyzho enabled auto-merge (squash) February 11, 2026 23:05
@bkryu
Copy link
Collaborator

bkryu commented Feb 12, 2026

@jimmyzho, can you rebase to re-trigger the public CI/merge check?

@jimmyzho jimmyzho merged commit 002087c into flashinfer-ai:main Feb 12, 2026
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants