Skip to content

Conversation

@zhiyuan1i
Copy link
Contributor

@zhiyuan1i zhiyuan1i commented May 6, 2025

This PR makes RWKV7 support FP8 format quantization, and the actual matmul computation is still FP16 or BF16, which is determined by the _scaled_mm kernel of PyTorch. The kernel broadly supports SM75 and above devices, so I think it's reasonable to replace int8 quantized support.

@zhiyuan1i zhiyuan1i marked this pull request as draft May 6, 2025 10:05
@zhiyuan1i zhiyuan1i marked this pull request as ready for review May 6, 2025 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant