[Bug]: vllm/vllm-openai:v0.11.0 deployment --quantization fp8 throws cuda and tensor errors

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

Please help me to quantize the model,and mention which flag should i use for quantization.

Please find my deployment spec Model -  meta-llama/Llama-3.3-70B-Instruct:
        - --quantization
        - fp8
        - --kv-cache-dtype
        - fp8 

Enabling   "quantization fp8" is failing deployment.
Spec:
 NVIDIA-SMI 550.127.08             Driver Version: 550.127.08     CUDA Version: 12.9

Error:
Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:42 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x7f9b9b4d9eb0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x111c7 (0x7f9b9b56c1c7 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::symmetric_memory::AllocationRef::~AllocationRef() + 0xce (0x7f9b3eb25ece in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::symmetric_memory::Block::~Block() + 0x1d2 (0x7f9b3eb2cc92 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::symmetric_memory::CUDASymmetricMemoryAllocator::free(void*) + 0x1de (0x7f9b3eb247fe in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: <unknown function> + 0x30ddc (0x7f9b9b478ddc in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #6: <unknown function> + 0x434a5f (0x7f9b8e172a5f in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f9b9b4b7179 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #8: <unknown function> + 0x6c9e58 (0x7f9b8e407e58 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x6ca22d (0x7f9b8e40822d in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_python.so)
frame #10: VLLM::Worker_TP6() [0x59b914]
frame #11: VLLM::Worker_TP6() [0x59b914]
frame #12: VLLM::Worker_TP6() [0x53be84]
frame #13: VLLM::Worker_TP6() [0x59bded]
frame #14: VLLM::Worker_TP6() [0x5312d3]


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: vllm/vllm-openai:v0.11.0 deployment --quantization fp8 throws cuda and tensor errors #29374

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: vllm/vllm-openai:v0.11.0 deployment --quantization fp8 throws cuda and tensor errors #29374

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions