Use faster dequant for fp4 by awni · Pull Request #2720 · ml-explore/mlx

awni · 2025-10-31T14:13:13Z

Uses a faster + simpler dequant (instead of the LUT for FP4). Speeds up generation a little but also simplifies the code.

Pre: eneration_tps=121.29
Post: generation_tps=124.01

For QMMs I didn't change it because it's slower. The extra ops are worse than using the LUT.

angeloskath

Very nice!

awni · 2025-10-31T18:49:51Z

Unfortunately can't take credit for this. The very clever dequant here (putting the bits in an fp16 and scaling) was from Alex Kan

use faster dequant for fp4 qmv

9aa8483

awni requested a review from angeloskath October 31, 2025 14:15

angeloskath approved these changes Oct 31, 2025

View reviewed changes

awni merged commit 39b04ce into main Oct 31, 2025
5 checks passed

awni deleted the no_fp4_lut branch October 31, 2025 18:50

BrewTestBot mentioned this pull request Nov 20, 2025

mlx 0.30.0 Homebrew/homebrew-core#255173

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use faster dequant for fp4#2720

Use faster dequant for fp4#2720
awni merged 1 commit intomainfrom
no_fp4_lut

awni commented Oct 31, 2025 •

edited

Loading

Uh oh!

angeloskath left a comment

Uh oh!

awni commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

awni commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

awni commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awni commented Oct 31, 2025 •

edited

Loading