Skip to content

[Bug]: DeepSeek R1 with CUTLASS MLA Broken on B200 #33627

@robertgshaw2-redhat

Description

@robertgshaw2-redhat

Your current environment

The output of python collect_env.py
Your output of `python collect_env.py` here

🐛 Describe the bug

launch_mtp:
	chg run --gpus {{GPUS}} -- vllm serve {{MODEL}} -tp {{GPUS}} --speculative_config '{"num_speculative_tokens":1, "method":"deepseek_mtp"}' --port {{PORT}} --enforce-eager --attention-backend CUTLASS_MLA

I get:

(Worker_TP2 pid=404489) ERROR 02-02 20:46:37 [multiproc_executor.py:772]     super().__init__(
(Worker_TP2 pid=404489) ERROR 02-02 20:46:37 [multiproc_executor.py:772] TypeError: vllm.model_executor.layers.attention.mla_attention.MLACommonImpl.__init__() got multiple values for keyword argument 'q_pad_num_heads'
[rank0]:[W202 20:46:37.582305111 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions