Skip to content

Conversation

@lburzawa
Copy link
Contributor

@lburzawa lburzawa commented Nov 14, 2025

Motivation

Improve MOE a8w4 performance.

Technical Details

  • Fused 3 routing kernel into 2 for small batches of <= 16 tokens.
  • Tuned config for batch of 1024.

Test Plan

  • Correctness and performance tests.

Test Result

  • Correctness tests pass.
  • Microbench and e2e performance improved on targeted batches.

Submission Checklist

@lburzawa lburzawa requested review from azaidy and vgokhale November 14, 2025 03:48
@vgokhale vgokhale changed the title Moe a8w4 tuning [TRITON] Moe a8w4 tuning Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants