Commit c3f2596
fix: Fix trtllm-gen prefill IMA when batch_size==1 (#1912)
<!-- .github/pull_request_template.md -->
## 📌 Description
<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->
Current PR fixes the test and benchmark codes IMAs when running
trtllm-gen paged & ragged prefill with batch size 1 -- the issue was
described in #1898
Root cause of the issue:
`flashinfer.prefill.trtllm_ragged_attention_deepseek` and
`flashinfer.prefill.trtllm_batch_context_with_kv_cache` both require
`max_q_len` to match the length of the query when batch size is 1.
**Updated PR:**
Issue has been addressed from the kernel-side so that the "*`max_q_len`
to match the length of the query when batch size is 1*" is no longer
required.
Current PR updates trtllm-gen FMHA cubins to latest and brings minor
updates to kernel metadata.
Unit test results after PR:
```
$ pytest tests/attention/test_trtllm_gen_attention.py
...
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /flashinfer
configfile: pytest.ini
collected 2320 items
...
2055 passed, 264 skipped, 1 xfailed in 224.43s (0:03:44)
```
**Description of previous solution:**
~~Updating `max_q_len` to `cum_seq_lens_q[-1].item()` within the
`trtllm_ragged_attention_deepseek` or
`trtllm_batch_context_with_kv_cache` functions are not a viable option
because the CPU-side synchronization breaks the deterministic and fully
device-side execution required during CUDA graph capture. The workaround
was thus to update the test & benchmark codes that call the trtllm
prefill functions, and clearly state in the docstring that when
batch_size == 1, max_q_len must match the query size.~~
## 🔍 Related Issues
#1898
<!-- Link any related issues here -->
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [ ] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Removed the automatic batch_size=1 restriction for a native backend,
enabling its use in more scenarios while other constraints remain.
* **New Features**
* Added configurable block-sparse attention support to kernel
parameters.
* **Documentation**
* Clarified supported attention optimizations and backend capabilities
in the benchmarks docs.
* **Tests**
* Expanded tests with configurable sequence lengths and added dedicated
batch-size-1 test coverage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Zihao Ye <[email protected]>1 parent dad0682 commit c3f2596
File tree
5 files changed
+63
-15
lines changed- benchmarks
- routines
- flashinfer
- include/flashinfer/trtllm/fmha
- tests/attention
5 files changed
+63
-15
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
696 | 696 | | |
697 | 697 | | |
698 | 698 | | |
699 | | - | |
700 | | - | |
701 | | - | |
702 | | - | |
703 | 699 | | |
704 | 700 | | |
705 | 701 | | |
| |||
1184 | 1180 | | |
1185 | 1181 | | |
1186 | 1182 | | |
1187 | | - | |
1188 | | - | |
1189 | | - | |
1190 | | - | |
1191 | 1183 | | |
1192 | 1184 | | |
1193 | 1185 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
83 | | - | |
| 83 | + | |
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
| 98 | + | |
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| |||
107 | 107 | | |
108 | 108 | | |
109 | 109 | | |
110 | | - | |
| 110 | + | |
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
152 | 152 | | |
153 | 153 | | |
154 | 154 | | |
| 155 | + | |
| 156 | + | |
155 | 157 | | |
156 | 158 | | |
157 | 159 | | |
| |||
699 | 701 | | |
700 | 702 | | |
701 | 703 | | |
| 704 | + | |
| 705 | + | |
702 | 706 | | |
703 | 707 | | |
704 | 708 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
331 | 331 | | |
332 | 332 | | |
333 | 333 | | |
| 334 | + | |
| 335 | + | |
334 | 336 | | |
335 | 337 | | |
336 | 338 | | |
| |||
343 | 345 | | |
344 | 346 | | |
345 | 347 | | |
| 348 | + | |
| 349 | + | |
346 | 350 | | |
347 | 351 | | |
348 | 352 | | |
349 | 353 | | |
350 | 354 | | |
351 | 355 | | |
352 | 356 | | |
353 | | - | |
354 | | - | |
355 | 357 | | |
356 | 358 | | |
357 | 359 | | |
358 | 360 | | |
359 | | - | |
| 361 | + | |
360 | 362 | | |
361 | 363 | | |
362 | 364 | | |
| |||
525 | 527 | | |
526 | 528 | | |
527 | 529 | | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
528 | 580 | | |
529 | 581 | | |
530 | 582 | | |
| |||
998 | 1050 | | |
999 | 1051 | | |
1000 | 1052 | | |
1001 | | - | |
1002 | 1053 | | |
1003 | 1054 | | |
1004 | 1055 | | |
| |||
0 commit comments