Commit f9373aa
authored
chore: MoE benchmark effective BW fix for trtllm_block_scale_moe (#2341)
<!-- .github/pull_request_template.md -->
## 📌 Description
The MoE benchmark script overestimates the num bytes loaded by assuming
all experts are active. I saw effective BW exceeds 3x the peak BW of
some system as a result. The fix is to calculate the routed experts
(topk_ids) on the host side and count the unique number of experts, the
same logic `cutlass_fused_moe` does.
While investigating the above issue, I also found data init of
routing_bias using `rand()` results in very skewed expert distribution
(repro cmd below gives 18 active out of 128 experts). I'd like to change
it to `ones()*0.1` for smoother expert distribution (noe giving 114 out
of 128), while maintaining the same load/compute behavior in the
kernels.
```
python3 flashinfer_benchmark.py --routine trtllm_fp4_block_scale_moe --num_tokens 32 --hidden_size 7168 --intermediate_size 2048 --num_experts 128 --routing_method deepseek_v3 --top_k 8 --n_group 8 --topk_group 4 --routed_scaling_factor 2.5 --use_routing_bias --use_shuffled_weight --generate_repro_command -vv
```
<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->
## 🔍 Related Issues
<!-- Link any related issues here -->
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [ ] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added support for nvfp4 and mxfp4 quantization formats in bandwidth
calculations.
* Introduced routing support for DeepSeekV3 method.
* **Improvements**
* Enhanced routing bias initialization for more consistent expert
distribution.
* Expanded routing computation utilities for greater flexibility.
* **Tests**
* Updated benchmark test data to align with new routing and quantization
logic.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->1 parent cc1a362 commit f9373aa
1 file changed
+126
-20
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
| 18 | + | |
17 | 19 | | |
18 | 20 | | |
19 | 21 | | |
| |||
316 | 318 | | |
317 | 319 | | |
318 | 320 | | |
319 | | - | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
320 | 325 | | |
321 | 326 | | |
322 | 327 | | |
| |||
430 | 435 | | |
431 | 436 | | |
432 | 437 | | |
| 438 | + | |
433 | 439 | | |
434 | 440 | | |
435 | 441 | | |
436 | 442 | | |
437 | 443 | | |
438 | | - | |
439 | | - | |
| 444 | + | |
| 445 | + | |
440 | 446 | | |
441 | 447 | | |
442 | 448 | | |
443 | 449 | | |
444 | 450 | | |
445 | | - | |
446 | | - | |
447 | | - | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
448 | 456 | | |
449 | 457 | | |
450 | 458 | | |
| |||
472 | 480 | | |
473 | 481 | | |
474 | 482 | | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
475 | 486 | | |
476 | 487 | | |
477 | 488 | | |
| |||
490 | 501 | | |
491 | 502 | | |
492 | 503 | | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
493 | 566 | | |
494 | 567 | | |
495 | 568 | | |
| |||
588 | 661 | | |
589 | 662 | | |
590 | 663 | | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
591 | 676 | | |
592 | 677 | | |
593 | 678 | | |
| |||
781 | 866 | | |
782 | 867 | | |
783 | 868 | | |
784 | | - | |
785 | | - | |
| 869 | + | |
| 870 | + | |
786 | 871 | | |
| 872 | + | |
| 873 | + | |
787 | 874 | | |
788 | 875 | | |
789 | 876 | | |
| |||
1142 | 1229 | | |
1143 | 1230 | | |
1144 | 1231 | | |
1145 | | - | |
1146 | | - | |
1147 | | - | |
1148 | | - | |
1149 | | - | |
1150 | | - | |
1151 | | - | |
1152 | | - | |
1153 | | - | |
1154 | | - | |
1155 | | - | |
1156 | | - | |
| 1232 | + | |
| 1233 | + | |
1157 | 1234 | | |
1158 | 1235 | | |
| 1236 | + | |
1159 | 1237 | | |
1160 | 1238 | | |
1161 | 1239 | | |
| |||
1278 | 1356 | | |
1279 | 1357 | | |
1280 | 1358 | | |
| 1359 | + | |
| 1360 | + | |
| 1361 | + | |
| 1362 | + | |
| 1363 | + | |
| 1364 | + | |
| 1365 | + | |
| 1366 | + | |
| 1367 | + | |
| 1368 | + | |
| 1369 | + | |
| 1370 | + | |
1281 | 1371 | | |
1282 | 1372 | | |
1283 | 1373 | | |
| |||
1412 | 1502 | | |
1413 | 1503 | | |
1414 | 1504 | | |
| 1505 | + | |
| 1506 | + | |
1415 | 1507 | | |
1416 | 1508 | | |
1417 | 1509 | | |
| |||
1533 | 1625 | | |
1534 | 1626 | | |
1535 | 1627 | | |
| 1628 | + | |
| 1629 | + | |
| 1630 | + | |
| 1631 | + | |
| 1632 | + | |
| 1633 | + | |
| 1634 | + | |
| 1635 | + | |
| 1636 | + | |
| 1637 | + | |
| 1638 | + | |
| 1639 | + | |
1536 | 1640 | | |
1537 | 1641 | | |
1538 | 1642 | | |
| |||
1630 | 1734 | | |
1631 | 1735 | | |
1632 | 1736 | | |
| 1737 | + | |
| 1738 | + | |
1633 | 1739 | | |
1634 | 1740 | | |
1635 | 1741 | | |
| |||
0 commit comments