Skip to content

add additional launch arg for extraFlag2#2

Merged
murphymatt merged 2 commits intoglm_v4_routingfrom
fix-kernel-launch-config
Nov 10, 2025
Merged

add additional launch arg for extraFlag2#2
murphymatt merged 2 commits intoglm_v4_routingfrom
fix-kernel-launch-config

Conversation

@murphymatt
Copy link
Contributor

@murphymatt murphymatt commented Nov 6, 2025

📌 Description

We were missing an arg for extraFlag2

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

@murphymatt murphymatt requested a review from divchenko November 6, 2025 21:01
Copy link

@divchenko divchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a unit test so we can verify that it's fixed?

@murphymatt murphymatt merged commit d6bbb21 into glm_v4_routing Nov 10, 2025
1 check passed
murphymatt pushed a commit that referenced this pull request Feb 27, 2026
<!-- .github/pull_request_template.md -->

## 📌 Description

To fix the following bug:
When the CuteDSL MoE kernels were ported from TensorRT-LLM to
FlashInfer, the mPtrPermutedIdxToExpandedIdx field was accidentally
dropped from the routing kernel's DataBase struct in RoutingKernel.h.
TRT-LLM's routing kernel produces three reverse-mapping outputs:

1. mPtrExpandedIdxToPermutedIdx[expandedIdx] = permutedIdx — forward
mapping
2. mPtrPermutedIdxToExpandedIdx[permutedIdx] = expandedIdx — reverse to
expanded index (token_idx * topk + k)
3. mPtrPermutedIdxToTokenIdx[permutedIdx] = tokenIdx — reverse to token
index only

FlashInfer's port kept only #1 and #3, dropping #2. The binding in
moe_utils_binding.cu then had to wire the Python buffer
permuted_idx_to_expanded_idx to the only available reverse-mapping field
— mPtrPermutedIdxToTokenIdx — which writes plain tokenIdx instead of
expandedIdx.
The Impact
The CuteDSL kernels (GEMM1 gather, moe_output_memset, GEMM2 finalize)
all expect expanded indices and derive the token index via expanded_idx
// topk. When they received plain tokenIdx instead, they computed
tokenIdx // topk — yielding the wrong A row for gather, wrong zero-init
for memset, and wrong scatter position + wrong routing scale for
finalize.

<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->

## 🔍 Related Issues

<!-- Link any related issues here -->

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [ ] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [ ] I have installed the hooks with `pre-commit install`.
- [ ] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [ ] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Refined MOE (Mixture of Experts) routing infrastructure by extending
index mapping capabilities across multiple kernel implementations to
improve internal data flow consistency.

* **Tests**
* Strengthened accuracy validation thresholds from 0.925 to 0.97 with
adjusted error tolerance parameters, ensuring more rigorous testing of
MOE operations under FP4 quantization conditions.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants