[Feature]: DeepSeek-Coder-V2-Instruct-FP8 on 8xA100

### 🚀 The feature, motivation and pitch

VLLM has announced support for running llama3.1-405b-fp8 on 8xA100. This is the [blog](https://blog.vllm.ai/2024/07/23/llama31.html)

Does vllm support running DeepSeek-Coder-V2-Instruct-FP8 on 8xA100?

However, I notice that vLLM uses Triton for its FusedMoE kernel, which doesn't support the FP8 Marlin mixed-precision. See https://github.com/sgl-project/sglang/issues/989#issuecomment-2275698772

Is there any work around?

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: DeepSeek-Coder-V2-Instruct-FP8 on 8xA100 #7322

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: DeepSeek-Coder-V2-Instruct-FP8 on 8xA100 #7322

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions