llm-d-incubation · kalantar · Nov 18, 2025 · Nov 12, 2025 · Nov 12, 2025 · Nov 12, 2025
diff --git a/README.md b/README.md
@@ -58,37 +58,40 @@ Check Helm's [official docs](https://helm.sh/docs/intro/using_helm/) for more gu
 
 ## Values
 Below are the values you can set.
-| Key                                    | Description                                                                                                       | Type         | Default                                     |
-|----------------------------------------|-------------------------------------------------------------------------------------------------------------------|--------------|---------------------------------------------|
-| `modelArtifacts.name`                  | name of model in the form namespace/modelId. Required.                                                            | string       | N/A                                         |
-| `modelArtifacts.uri`                   | Model artifacts URI. Current formats supported include `hf://`, `pvc://`, and `oci://`                            | string       | N/A                                         |
-| `modelArtifacts.size`                  | Size used to create an emptyDir volume for downloading the model.                                                 | string       | N/A                                         |
-| `modelArtifacts.authSecretName`        | The name of the Secret containing `HF_TOKEN` for `hf://` artifacts that require a token for downloading a model.  | string       | N/A                                         |
-| `modelArtifacts.mountPath`             | Path to mount the volume created to store models                                                                  | string       | /model-cache                                |
-| `multinode`                            | Determines whether to create P/D using Deployments (false) or LeaderWorkerSets (true)                             | bool         | `false`                                     |
-| `routing.servicePort`                  | The port the routing proxy sidecar listens on. <br>If there is no sidecar, this is the port the request goes to.  | int          | N/A                                         |
-| `routing.proxy.image`                  | Image used for the sidecar                                                                                        | string       | `ghcr.io/llm-d/llm-d-routing-sidecar:0.0.6` |
-| `routing.proxy.targetPort`             | The port the vLLM decode container listens on. <br>If proxy is present, it will forward request to this port.     | string       | N/A                                         |
-| `routing.proxy.debugLevel`             | Debug level of the routing proxy                                                                                  | int          | 5                                           |
-| `routing.proxy.parentRefs[*].name`     | The name of the inference gateway                                                                                 | string       | N/A                                         |
-| `decode.create`                        | If true, creates decode Deployment or LeaderWorkerSet                                                             | List         | `true`                                      |
-| `decode.annotations`                   | Annotations that should be added to the Deployment or LeaderWorkerSet                                             | Dict         | {}                                          |
-| `decode.tolerations`                   | Tolerations that should be added to the Deployment or LeaderWorkerSet                                             | List         | []                                          |
-| `decode.replicas`                      | Number of replicas for decode pods                                                                                | int          | 1                                           |
-| `decode.extraConfig`                   | Extra pod configuration                                                                                           | dict         | {}                                          |
-| `decode.containers[*].name`            | Name of the container for the decode deployment/LWS                                                               | string       | N/A                                         |
-| `decode.containers[*].image`           | Image of the container for the decode deployment/LWS                                                              | string       | N/A                                         |
-| `decode.containers[*].args`            | List of arguments for the decode container.                                                                       | List[string] | []                                          |
-| `decode.containers[*].modelCommand`    | Nature of the command. One of `vllmServe`, `imageDefault` or `custom`                                             | string       | `imageDefault`                              |
-| `decode.containers[*].command`         | List of commands for the decode container.                                                                        | List[string] | []                                          |
-| `decode.containers[*].ports`           | List of ports for the decode container.                                                                           | List[Port]   | []                                          |
-| `decode.containers[*].extraConfig`     | Extra container configuration                                                                                     | dict         | {}                                          |
-| `decode.parallelism.data`              | Amount of data parallelism                                                                                        | int          | 1                                           |
-| `decode.parallelism.tensor`            | Amount of tensor parallelism                                                                                      | int          | 1                                           |
-| `decode.acceleratorTypes.labelKey`     | Key of label on node that identifies the hosted GPU type                                                          | string       | N/A                                         |
-| `decode.acceleratorTypes.labelValue`   | Value of label on node that identifies type of hosted GPU                                                         | string       | N/A                                         |
-| `prefill`                              | Same fields supported in `decode`                                                                                 | See above    | See above                                   |
-| `extraObjects`                         | Additional Kubernetes objects to be deployed alongside the main application                                        | List         | []                                          |
+| Key                                    | Description                                                                                                       | Type            | Default                                     |
+|----------------------------------------|-------------------------------------------------------------------------------------------------------------------|-----------------|---------------------------------------------|
+| `modelArtifacts.name`                  | name of model in the form namespace/modelId. Required.                                                            | string          | N/A                                         |
+| `modelArtifacts.uri`                   | Model artifacts URI. Current formats supported include `hf://`, `pvc://`, and `oci://`                            | string          | N/A                                         |
+| `modelArtifacts.size`                  | Size used to create an emptyDir volume for downloading the model.                                                 | string          | N/A                                         |
+| `modelArtifacts.authSecretName`        | The name of the Secret containing `HF_TOKEN` for `hf://` artifacts that require a token for downloading a model.  | string          | N/A                                         |
+| `modelArtifacts.mountPath`             | Path to mount the volume created to store models                                                                  | string          | /model-cache                                |
+| `multinode`                            | Determines whether to create P/D using Deployments (false) or LeaderWorkerSets (true)                             | bool            | `false`                                     |
+| `routing.servicePort`                  | The port the routing proxy sidecar listens on. <br>If there is no sidecar, this is the port the request goes to.  | int             | N/A                                         |
+| `routing.proxy.image`                  | Image used for the sidecar                                                                                        | string          | `ghcr.io/llm-d/llm-d-routing-sidecar:0.0.6` |
+| `routing.proxy.targetPort`             | The port the vLLM decode container listens on. <br>If proxy is present, it will forward request to this port.     | string          | N/A                                         |
+| `routing.proxy.debugLevel`             | Debug level of the routing proxy                                                                                  | int             | 5                                           |
+| `routing.proxy.parentRefs[*].name`     | The name of the inference gateway                                                                                 | string          | N/A                                         |
+| `decode.create`                        | If true, creates decode Deployment or LeaderWorkerSet                                                             | List            | `true`                                      |
+| `decode.annotations`                   | Annotations that should be added to the Deployment or LeaderWorkerSet                                             | Dict            | {}                                          |
+| `decode.tolerations`                   | Tolerations that should be added to the Deployment or LeaderWorkerSet                                             | List            | []                                          |
+| `decode.replicas`                      | Number of replicas for decode pods                                                                                | int             | 1                                           |
+| `decode.extraConfig`                   | Extra pod configuration                                                                                           | dict            | {}                                          |
+| `decode.containers[*].name`            | Name of the container for the decode deployment/LWS                                                               | string          | N/A                                         |
+| `decode.containers[*].image`           | Image of the container for the decode deployment/LWS                                                              | string          | N/A                                         |
+| `decode.containers[*].args`            | List of arguments for the decode container.                                                                       | List[string]    | []                                          |
+| `decode.containers[*].modelCommand`    | Nature of the command. One of `vllmServe`, `imageDefault` or `custom`                                             | string          | `imageDefault`                              |
+| `decode.containers[*].command`         | List of commands for the decode container.                                                                        | List[string]    | []                                          |
+| `decode.containers[*].ports`           | List of ports for the decode container.                                                                           | List[Port]      | []                                          |
+| `decode.containers[*].extraConfig`     | Extra container configuration                                                                                     | dict            | {}                                          |
+| `decode.initContainers`.               | List of initContainers that should be added (in addition to routing proxy if enabled)                             | List[Container] | N/A                                         |
+| `decode.parallelism.tensor`            | Amount of tensor parallelism                                                                                      | int             | 1                                           |
+| `decode.parallelism.data`              | Amount of data parallelism                                                                                        | int             | 1                                           |
+| `decode.parallelism.dataLocal`         | Amount of data local parallelism                                                                                  | int             | 1                                           |
+| `decode.parallelism.workers`           | Number of workers over which data parallelism is implemented                                                      | int             | 1                                           |
+| `decode.acceleratorTypes.labelKey`     | Key of label on node that identifies the hosted GPU type                                                          | string          | N/A                                         |
+| `decode.acceleratorTypes.labelValue`   | Value of label on node that identifies type of hosted GPU                                                         | string          | N/A                                         |
+| `prefill`                              | Same fields supported in `decode`                                                                                 | See above       | See above                                   |
+| `extraObjects`                         | Additional Kubernetes objects to be deployed alongside the main application                                       | List            | []                                          |
 
 ## Contribute
 

diff --git a/charts/llm-d-modelservice/Chart.yaml b/charts/llm-d-modelservice/Chart.yaml
@@ -13,7 +13,7 @@ type: application
 # This is the chart version. This version number should be incremented each time you make changes
 # to the chart and its templates, including the app version.
 # Versions are expected to follow Semantic Versioning (https://semver.org/)
-version: "v0.3.4"
+version: "v0.3.5"
 # This is the version number of the application being deployed. This version number should be
 # incremented each time you make changes to the application. Versions are not expected to
 # follow Semantic Versioning. They should reflect the version the application is using.