Skip to content

Add quantize/dequantize for mxfp8 and nvfp4#2688

Merged
awni merged 11 commits intomainfrom
mxfp8_and_nvfp4
Oct 28, 2025
Merged

Add quantize/dequantize for mxfp8 and nvfp4#2688
awni merged 11 commits intomainfrom
mxfp8_and_nvfp4

Conversation

@awni
Copy link
Copy Markdown
Member

@awni awni commented Oct 20, 2025

Supports mxfp8 and nvfp4 in quantize/dequantize and adds kernels for mx and nv quants.

  • Ops based fallback for CPU
  • Fast CUDA kernels
  • Fast Metal kernels
  • Defaults for bits and group size based on mode

CC @nastya236

@awni awni changed the title Add ops-based quantize/dequantize for mxfp8 and nvfp4 Add quantize/dequantize for mxfp8 and nvfp4 Oct 21, 2025
@awni awni force-pushed the mxfp8_and_nvfp4 branch 3 times, most recently from aafe1fa to 92dbc55 Compare October 24, 2025 20:28
@awni awni requested a review from angeloskath October 28, 2025 19:53
Copy link
Copy Markdown
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome!

@awni awni merged commit ec72b44 into main Oct 28, 2025
6 checks passed
@awni awni deleted the mxfp8_and_nvfp4 branch October 28, 2025 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants