fix: include fp8_blockscale_gemm_90 in AOT jit-cache by Edward-lyz · Pull Request #2533 · flashinfer-ai/flashinfer

Edward-lyz · 2026-02-10T08:33:37Z

Summary

Add fp8_blockscale_gemm_90 (gen_fp8_blockscale_gemm_sm90_module) to the AOT build list when SM90 is enabled.
Avoid runtime JIT compilation for fp8_blockscale_gemm_sm90 in environments without CUDA dev headers, which can fail with cublasLt.h not found.

Changes

flashinfer/aot.py: append gen_fp8_blockscale_gemm_sm90_module() under add_moe + has_sm90 gating.

Related Issues

Fixes [bug][cubin] Binaries for fp8_blockscale_gemm_sm90 not present in flashinfer-cubin and flashinfer-jit-cache #2527
[bug][cubin] Binaries for fp8_blockscale_gemm_sm90 not present in flashinfer-cubin and flashinfer-jit-cache #2527

Tests

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
[] All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

New Features
- Added support for FP8 blockscale matrix multiplication operations on SM90 GPU architecture.

## Summary - Add fp8_blockscale_gemm_90 (gen_fp8_blockscale_gemm_sm90_module) to the AOT build list when SM90 is enabled. - Avoid runtime JIT compilation for fp8_blockscale_gemm_sm90 in environments without CUDA dev headers, which can fail with cublasLt.h not found. ## Changes - flashinfer/aot.py: append gen_fp8_blockscale_gemm_sm90_module() under add_moe + has_sm90 gating. ## Related Issues - Fixes flashinfer-ai#2527 - flashinfer-ai#2527 ## Tests - Not run locally (no SM90 GPU in this environment).

gemini-code-assist · 2026-02-10T08:33:49Z

Summary of Changes

Hello @Edward-lyz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of the FlashInfer library by ensuring that the fp8_blockscale_gemm_sm90 module is pre-compiled for SM90-enabled systems. This change eliminates the need for runtime JIT compilation of this specific module, thereby mitigating issues related to missing CUDA development headers and improving overall stability and deployment predictability.

Highlights

AOT Compilation: Included the fp8_blockscale_gemm_90 module in the Ahead-Of-Time (AOT) build list for SM90 architectures.
Runtime Stability: Prevented potential runtime JIT compilation failures for fp8_blockscale_gemm_sm90 in environments lacking CUDA development headers, specifically addressing cublasLt.h not found errors.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

flashinfer/aot.py
- Imported gen_fp8_blockscale_gemm_sm90_module from flashinfer.jit.gemm.
- Appended gen_fp8_blockscale_gemm_sm90_module() to the jit_specs list within the gen_all_modules function, specifically under the has_sm90 conditional block.

Activity

Pre-commit checks were installed and run successfully.
Tests have been added or updated as needed, and all tests are passing.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-10T08:34:04Z

📝 Walkthrough

Walkthrough

Added an import and invocation of gen_fp8_blockscale_gemm_sm90_module() to the AOT module generation pipeline in the MOE build path. This ensures FP8 blockscale GEMM kernels for SM90 are included in the precompiled binary cache.

Changes

Cohort / File(s)	Summary
FP8 Blockscale GEMM SM90 AOT Integration `flashinfer/aot.py`	Added import of `gen_fp8_blockscale_gemm_sm90_module` and appended its invocation to the JIT specs generation under the MOE/SM90 branch to ensure precompiled kernels are available.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

feat: enable deepgemm jit for fp8 block-scale on SM90 #1969: Enables JIT compilation paths for FP8 block-scale GEMM kernels on SM90; this PR complements it by ensuring those kernels are included in AOT precompiled binaries.

Suggested reviewers

yzh119
cyx-6
bkryu
nvmbreughe

Poem

🐰✨ A blockscale gem, so FP8 and fine,
SM90's kernel joins the cache divine,
No more JIT when binaries align,
The rabbit hops—another build refine! 🔧

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: including fp8_blockscale_gemm_90 in the AOT jit-cache.
Description check	✅ Passed	The PR description addresses the core issue with good summary and rationale, though the template structure is partially duplicated.
Linked Issues check	✅ Passed	The code change successfully addresses the primary objective from issue `#2527` by adding fp8_blockscale_gemm_sm90_module to the AOT build list.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the stated objective: adding fp8_blockscale_gemm_sm90_module to AOT compilation in flashinfer/aot.py.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request correctly adds the fp8_blockscale_gemm_90 module to the Ahead-of-Time (AOT) compilation list. This change will prevent runtime JIT compilation failures in environments lacking CUDA development headers, which is a valuable improvement. The implementation is straightforward and correctly places the new module under the add_moe and has_sm90 flags, which is consistent with how other GEMM kernels are handled in the project. The changes look good and address the intended issue effectively.

yongwww · 2026-02-12T15:10:25Z

@flashinfer-bot run

yzh119

Thanks for working on this fix!

aleozlx · 2026-02-12T23:30:22Z

public ci seems still not started

aleozlx · 2026-02-12T23:30:30Z

@flashinfer-bot run

Edward-lyz requested review from bkryu, cyx-6, jimmyzho, nvmbreughe and yzh119 as code owners February 10, 2026 08:33

gemini-code-assist bot reviewed Feb 10, 2026

View reviewed changes

flashinfer-bot added the run-ci label Feb 12, 2026

yzh119 approved these changes Feb 12, 2026

View reviewed changes

aleozlx self-assigned this Feb 12, 2026

aleozlx added the v0.6.4 release blocker label for v0.6.4 label Feb 12, 2026

aleozlx removed their assignment Feb 12, 2026

yongwww approved these changes Feb 12, 2026

View reviewed changes

yongwww merged commit 292f9be into flashinfer-ai:main Feb 13, 2026
34 of 50 checks passed

mgoin mentioned this pull request Feb 16, 2026

[CI] Actually run tests/kernels/quantization/test_block_fp8.py in CI vllm-project/vllm#34274

Merged

5 tasks

coderabbitai bot mentioned this pull request Feb 27, 2026

Enable sm120f compilation #2650

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: include fp8_blockscale_gemm_90 in AOT jit-cache#2533

fix: include fp8_blockscale_gemm_90 in AOT jit-cache#2533
yongwww merged 1 commit intoflashinfer-ai:mainfrom
Edward-lyz:main

Edward-lyz commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

yongwww commented Feb 12, 2026

Uh oh!

yzh119 left a comment

Uh oh!

aleozlx commented Feb 12, 2026

Uh oh!

aleozlx commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Edward-lyz commented Feb 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Related Issues

Tests

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yongwww commented Feb 12, 2026

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

aleozlx commented Feb 12, 2026

Uh oh!

aleozlx commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Edward-lyz commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 10, 2026 •

edited

Loading