Skip to content

Use faster dequant for fp4#2720

Merged
awni merged 1 commit intomainfrom
no_fp4_lut
Oct 31, 2025
Merged

Use faster dequant for fp4#2720
awni merged 1 commit intomainfrom
no_fp4_lut

Conversation

@awni
Copy link
Copy Markdown
Member

@awni awni commented Oct 31, 2025

Uses a faster + simpler dequant (instead of the LUT for FP4). Speeds up generation a little but also simplifies the code.

Pre: eneration_tps=121.29
Post: generation_tps=124.01

For QMMs I didn't change it because it's slower. The extra ops are worse than using the LUT.

@awni awni requested a review from angeloskath October 31, 2025 14:15
Copy link
Copy Markdown
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@awni
Copy link
Copy Markdown
Member Author

awni commented Oct 31, 2025

Unfortunately can't take credit for this. The very clever dequant here (putting the bits in an fp16 and scaling) was from Alex Kan

@awni awni merged commit 39b04ce into main Oct 31, 2025
5 checks passed
@awni awni deleted the no_fp4_lut branch October 31, 2025 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants