Skip to content

DP: EPP Not Load Balancing chat/completions Requests #1767

@danehans

Description

@danehans

What happened:

EPP load balances between both vLLM DP ranks (--data-parallel-size=2) when using the completions API but not the chat/completions API.

What you expected to happen:

Both ranks to be used.

How to reproduce it (as minimally and precisely as possible):

Follow these steps.

Anything else we need to know?:

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions