What happened:
EPP load balances between both vLLM DP ranks (--data-parallel-size=2) when using the completions API but not the chat/completions API.
What you expected to happen:
Both ranks to be used.
How to reproduce it (as minimally and precisely as possible):
Follow these steps.
Anything else we need to know?:
What happened:
EPP load balances between both vLLM DP ranks (
--data-parallel-size=2) when using thecompletionsAPI but not thechat/completionsAPI.What you expected to happen:
Both ranks to be used.
How to reproduce it (as minimally and precisely as possible):
Follow these steps.
Anything else we need to know?:
v20251019-9b6272eandv20251023-e2f14ff.