🚀 The feature, motivation and pitch
VLLM has announced support for running llama3.1-405b-fp8 on 8xA100. This is the blog
Does vllm support running DeepSeek-Coder-V2-Instruct-FP8 on 8xA100?
However, I notice that vLLM uses Triton for its FusedMoE kernel, which doesn't support the FP8 Marlin mixed-precision. See sgl-project/sglang#989 (comment)
Is there any work around?
Alternatives
No response
Additional context
No response