Skip to content

[Bug]: ValueError: data_parallel_rank is not applicable in headless mode in commit 657f2f3 #20399

@bbenshab

Description

@bbenshab

Your current environment

I'm working on a dev setup, so collect_env.py is not an option; however, I found the root cause in a commit from 2 hours ago, which explains why I had no issues 2 hours ago.

🐛 Describe the bug

When running vLLM in headless mode on a worker pod in a Kubernetes cluster, the command fails with the following error:

ValueError: data_parallel_rank is not applicable in headless mode

despite not explicitly setting the --data-parallel-rank or --data-parallel-start-rank flags in the command. The error originates from the run_headless function in vllm/entrypoints/cli/serve.py, specifically at this check:

https://github.com/vllm-project/vllm/blame/657f2f301a431542a731719fa8c6326deacc317d/vllm/entrypoints/cli/serve.py#L130
`

if parallel_config.data_parallel_rank is not None:
    raise ValueError("data_parallel_rank is not applicable in "
                     "headless mode")

parallel_config.data_parallel_rank is being set to a non-None value internally, even though no rank-related flags are provided in the command. This prevents the headless worker from starting and connecting to the master node (vllm-inference-pytorchjob-final-master-0:29500) for data-parallel coordination. The issue appears to have been introduced or exacerbated by commit 657f2f3, as you noted that the setup worked two hours prior to this change being pushed.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions