Skip to content

fallback for cuda < 12.8#2697

Merged
awni merged 1 commit intoml-explore:mxfp8_and_nvfp4from
awni:mxfp8_and_nvfp4
Oct 23, 2025
Merged

fallback for cuda < 12.8#2697
awni merged 1 commit intoml-explore:mxfp8_and_nvfp4from
awni:mxfp8_and_nvfp4

Conversation

@awni
Copy link
Copy Markdown
Member

@awni awni commented Oct 23, 2025

As title.

@awni awni merged commit 8be324c into ml-explore:mxfp8_and_nvfp4 Oct 23, 2025
1 check was pending
awni pushed a commit that referenced this pull request Oct 27, 2025
awni pushed a commit that referenced this pull request Oct 28, 2025
* Add quantize/dequantize slow path for mxfp8 and nvfp4

* fast cuda kernel for mx/nv quantization

* fallback for cuda < 12.8 (#2697)

* format (#2700)

* fix (#2701)

* metal kernels

* docs

* fix jit

* add default bits and group sizes

* improve quant docs

* fix output type of mxfp4 matmuls
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant