Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 34 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,37 +58,40 @@ Check Helm's [official docs](https://helm.sh/docs/intro/using_helm/) for more gu

## Values
Below are the values you can set.
| Key | Description | Type | Default |
|----------------------------------------|-------------------------------------------------------------------------------------------------------------------|--------------|---------------------------------------------|
| `modelArtifacts.name` | name of model in the form namespace/modelId. Required. | string | N/A |
| `modelArtifacts.uri` | Model artifacts URI. Current formats supported include `hf://`, `pvc://`, and `oci://` | string | N/A |
| `modelArtifacts.size` | Size used to create an emptyDir volume for downloading the model. | string | N/A |
| `modelArtifacts.authSecretName` | The name of the Secret containing `HF_TOKEN` for `hf://` artifacts that require a token for downloading a model. | string | N/A |
| `modelArtifacts.mountPath` | Path to mount the volume created to store models | string | /model-cache |
| `multinode` | Determines whether to create P/D using Deployments (false) or LeaderWorkerSets (true) | bool | `false` |
| `routing.servicePort` | The port the routing proxy sidecar listens on. <br>If there is no sidecar, this is the port the request goes to. | int | N/A |
| `routing.proxy.image` | Image used for the sidecar | string | `ghcr.io/llm-d/llm-d-routing-sidecar:0.0.6` |
| `routing.proxy.targetPort` | The port the vLLM decode container listens on. <br>If proxy is present, it will forward request to this port. | string | N/A |
| `routing.proxy.debugLevel` | Debug level of the routing proxy | int | 5 |
| `routing.proxy.parentRefs[*].name` | The name of the inference gateway | string | N/A |
| `decode.create` | If true, creates decode Deployment or LeaderWorkerSet | List | `true` |
| `decode.annotations` | Annotations that should be added to the Deployment or LeaderWorkerSet | Dict | {} |
| `decode.tolerations` | Tolerations that should be added to the Deployment or LeaderWorkerSet | List | [] |
| `decode.replicas` | Number of replicas for decode pods | int | 1 |
| `decode.extraConfig` | Extra pod configuration | dict | {} |
| `decode.containers[*].name` | Name of the container for the decode deployment/LWS | string | N/A |
| `decode.containers[*].image` | Image of the container for the decode deployment/LWS | string | N/A |
| `decode.containers[*].args` | List of arguments for the decode container. | List[string] | [] |
| `decode.containers[*].modelCommand` | Nature of the command. One of `vllmServe`, `imageDefault` or `custom` | string | `imageDefault` |
| `decode.containers[*].command` | List of commands for the decode container. | List[string] | [] |
| `decode.containers[*].ports` | List of ports for the decode container. | List[Port] | [] |
| `decode.containers[*].extraConfig` | Extra container configuration | dict | {} |
| `decode.parallelism.data` | Amount of data parallelism | int | 1 |
| `decode.parallelism.tensor` | Amount of tensor parallelism | int | 1 |
| `decode.acceleratorTypes.labelKey` | Key of label on node that identifies the hosted GPU type | string | N/A |
| `decode.acceleratorTypes.labelValue` | Value of label on node that identifies type of hosted GPU | string | N/A |
| `prefill` | Same fields supported in `decode` | See above | See above |
| `extraObjects` | Additional Kubernetes objects to be deployed alongside the main application | List | [] |
| Key | Description | Type | Default |
|----------------------------------------|-------------------------------------------------------------------------------------------------------------------|-----------------|---------------------------------------------|
| `modelArtifacts.name` | name of model in the form namespace/modelId. Required. | string | N/A |
| `modelArtifacts.uri` | Model artifacts URI. Current formats supported include `hf://`, `pvc://`, and `oci://` | string | N/A |
| `modelArtifacts.size` | Size used to create an emptyDir volume for downloading the model. | string | N/A |
| `modelArtifacts.authSecretName` | The name of the Secret containing `HF_TOKEN` for `hf://` artifacts that require a token for downloading a model. | string | N/A |
| `modelArtifacts.mountPath` | Path to mount the volume created to store models | string | /model-cache |
| `multinode` | Determines whether to create P/D using Deployments (false) or LeaderWorkerSets (true) | bool | `false` |
| `routing.servicePort` | The port the routing proxy sidecar listens on. <br>If there is no sidecar, this is the port the request goes to. | int | N/A |
| `routing.proxy.image` | Image used for the sidecar | string | `ghcr.io/llm-d/llm-d-routing-sidecar:0.0.6` |
| `routing.proxy.targetPort` | The port the vLLM decode container listens on. <br>If proxy is present, it will forward request to this port. | string | N/A |
| `routing.proxy.debugLevel` | Debug level of the routing proxy | int | 5 |
| `routing.proxy.parentRefs[*].name` | The name of the inference gateway | string | N/A |
| `decode.create` | If true, creates decode Deployment or LeaderWorkerSet | List | `true` |
| `decode.annotations` | Annotations that should be added to the Deployment or LeaderWorkerSet | Dict | {} |
| `decode.tolerations` | Tolerations that should be added to the Deployment or LeaderWorkerSet | List | [] |
| `decode.replicas` | Number of replicas for decode pods | int | 1 |
| `decode.extraConfig` | Extra pod configuration | dict | {} |
| `decode.containers[*].name` | Name of the container for the decode deployment/LWS | string | N/A |
| `decode.containers[*].image` | Image of the container for the decode deployment/LWS | string | N/A |
| `decode.containers[*].args` | List of arguments for the decode container. | List[string] | [] |
| `decode.containers[*].modelCommand` | Nature of the command. One of `vllmServe`, `imageDefault` or `custom` | string | `imageDefault` |
| `decode.containers[*].command` | List of commands for the decode container. | List[string] | [] |
| `decode.containers[*].ports` | List of ports for the decode container. | List[Port] | [] |
| `decode.containers[*].extraConfig` | Extra container configuration | dict | {} |
| `decode.initContainers`. | List of initContainers that should be added (in addition to routing proxy if enabled) | List[Container] | N/A |
| `decode.parallelism.tensor` | Amount of tensor parallelism | int | 1 |
| `decode.parallelism.data` | Amount of data parallelism | int | 1 |
| `decode.parallelism.dataLocal` | Amount of data local parallelism | int | 1 |
| `decode.parallelism.workers` | Number of workers over which data parallelism is implemented | int | 1 |
| `decode.acceleratorTypes.labelKey` | Key of label on node that identifies the hosted GPU type | string | N/A |
| `decode.acceleratorTypes.labelValue` | Value of label on node that identifies type of hosted GPU | string | N/A |
| `prefill` | Same fields supported in `decode` | See above | See above |
| `extraObjects` | Additional Kubernetes objects to be deployed alongside the main application | List | [] |

## Contribute

Expand Down
2 changes: 1 addition & 1 deletion charts/llm-d-modelservice/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: "v0.3.4"
version: "v0.3.5"
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
Expand Down
Loading