If user set acceleratorResource, respect the value #141

yangligt2 · 2025-10-18T00:33:59Z

Fixes #140

yangligt2 · 2025-10-18T00:35:45Z

Also added temporary and basic test suite to validate the chart's rendering logic. It is intended as a stopgap solution until a more formal testing framework.

jgchn · 2025-10-23T01:57:23Z

tests/test-values-default.yaml

@@ -0,0 +1,14 @@
+# Test values for default accelerator resource behavior.
+# The chart should automatically set the GPU count to match tensor parallelism.


Should the default match tensor x data?

jgchn · 2025-10-23T01:58:26Z

tests/test-values-override.yaml

+    modelCommand: vllmServe
+    resources:
+      limits:
+        nvidia.com/gpu: "8" # User-defined value


Would this still work if I want to set gpu to 0? For example, for vLLM simulators that won't require GPUs but the args would still use tensor-parallel-size=2.

jgchn · 2025-10-23T01:59:09Z

tests/run-template-tests.sh

+echo "Running Helm template rendering tests..."
+echo "========================================"


This is really nice. I wonder if you want to include this as part of the Lint/Test Chart github action workflow.

kalantar · 2025-11-11T18:59:50Z

Agree that there is a problem. However, I think the solution is incomplete.
It does not take into account data local parallelism. Nor does it take into account multi-node scenarios (multinode: true).

My understanding of the concepts and relationships is:
tensor parallelism (tp) indicates how many GPUs a model (engine) is distributed over.

Data parallelism (dp) indicates the number of replicas of the full model. Each replica corresponds to a vllm engine. These vllm engines can run in the same pod ("single node") or in different pods ("multi-node").

In all cases, the total number of gpus needed is dp * tp.

However, in a multi-node scenario there is also the dp local (dpl) size (vllm option --data-parallel-size-local) which indicates the number vllm instances on a single pod. The sum of dpl over the number of nodes equals dp. In principle, the dpl for each pod can be different. However, since modelservice implements this using leaderworkersets, we will asssume that the number is the same. In this case, w * dpl = dp where w is the number of workers.

In the case of a single pod (w=1), dp = dpl

For a given pod, the number of gpu required is dpl * tp

There are 4 variables: tp, dp, dpl w. tp is always required (default 1). Any 2 of the remaining 2 allow us to compute the third and the number of gpu per pod.

Today modelservice allows specifying only 2 of these (tp, dp) which is sufficent for single node case.

Propose allowing the user to be able to specify dpl and w as well. Only 2 are required (default for all is 1).

It is easiest if user specifies dpl and w. then dp = dpl * w and #gpu/node = dpl * tp

If the user specifies other combinations, they have to be sure to get the ratios correct.

If the user specifies dp and w then dp/w = dpl must be an integer
If the user specifies tp, dp, and dpl then d/dpl = w must be an integer

kalantar · 2025-11-12T17:04:49Z

These changes have been incorporated into #159. Closing.

If user set acceleratorResource, respect the value

2e8a8b9

Fixes llm-d-incubation#140

jgchn reviewed Oct 23, 2025

View reviewed changes

This was referenced Nov 11, 2025

acceleratorResource is enforced to be equal to tensorParallelism #140

Closed

Parallelism fixes #159

Merged

kalantar closed this Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

If user set acceleratorResource, respect the value #141

If user set acceleratorResource, respect the value #141

Uh oh!

yangligt2 commented Oct 18, 2025

Uh oh!

yangligt2 commented Oct 18, 2025

Uh oh!

jgchn Oct 23, 2025

Uh oh!

jgchn Oct 23, 2025

Uh oh!

jgchn Oct 23, 2025

Uh oh!

kalantar commented Nov 11, 2025

Uh oh!

kalantar commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,14 @@
		# Test values for default accelerator resource behavior.
		# The chart should automatically set the GPU count to match tensor parallelism.

		echo "Running Helm template rendering tests..."
		echo "========================================"

If user set acceleratorResource, respect the value #141

If user set acceleratorResource, respect the value #141

Uh oh!

Conversation

yangligt2 commented Oct 18, 2025

Uh oh!

yangligt2 commented Oct 18, 2025

Uh oh!

jgchn Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

jgchn Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

jgchn Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

kalantar commented Nov 11, 2025

Uh oh!

kalantar commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants