[Standup] Serve models from read-only pvcs #553

maugustosilva · 2025-12-05T22:04:11Z

The pvc holding the models mounted by the pods are now read-only, with the environment variable VLLM_CACHE_ROOT (default value /tmp/vllm being automatically set)
Make variables VLLM_CACHE_ROOT, VLLM_WORKER_MULTIPROC_METHOD, VLLM_ALLOW_LONG_MAX_MODEL_LEN, VLLM_SERVER_DEV_MODE, VLLM_LOAD_FORMAT, VLLM_LOGGING_LEVEL, VLLM_ENABLE_SLEEP_MODE part of LLMDBENCH_VLLM_COMMON and automatically added to the environment (ready to be used by vllm)
IMPORTANT Any environment variable with the string LLMDBENCH_VLLM_COMMON_VLLM_<SOMETHING>, LLMDBENCH_VLLM_STANDALONE_VLLM_<SOMETHING>,
LLMDBENCH_VLLM_MODELSERVICE_PREFILL_VLLM_<SOMETHING>, LLMDBENCH_VLLM_MODELSERVICE_DECODE_VLLM_<SOMETHING> will be exported within the pod as VLLM_<SOMETHING>
Annotations stood-up-by, stood-up-from and stood-up-via are automatically added to both standalone and modelservice pods
Added a selector for the python implementation of kubectl_get
Added a python implementation of kubectl_delete
Extra command line parameters required when VLLM_ENABLE_SLEEP_MODE is set to true are now rendered inside add_command_line_options
Fix for get_model_name_from_pod. Get kubeconfig from current context.
More informative error messages when run.sh cannot locate a stack
Removed the environment variable LLMDBENCH_HARNESS_CONTAINER_IMAGE

* The `pvc` holding the models mounted by the `pods` are now **read-only**, with the environment variable `VLLM_CACHE_ROOT` (default value `/tmp/vllm` being automatically set) * Make variables `VLLM_CACHE_ROOT`, `VLLM_WORKER_MULTIPROC_METHOD`, `VLLM_ALLOW_LONG_MAX_MODEL_LEN`, `VLLM_SERVER_DEV_MODE`, `VLLM_LOAD_FORMAT`, `VLLM_LOGGING_LEVEL`, `VLLM_ENABLE_SLEEP_MODE` part of `LLMDBENCH_VLLM_COMMON` and automatically added to the environment (ready to be used by `vllm`) * **IMPORTANT** Any environment variable with the string `LLMDBENCH_VLLM_COMMON_VLLM_<SOMETHING>`, `LLMDBENCH_VLLM_STANDALONE_VLLM_<SOMETHING>`, `LLMDBENCH_VLLM_MODELSERVICE_PREFILL_VLLM_<SOMETHING>`, `LLMDBENCH_VLLM_MODELSERVICE_DECODE_VLLM_<SOMETHING>` will be exported within the `pod` as `VLLM_<SOMETHING>` * Annotations `stood-up-by`, `stood-up-from` and `stood-up-via` are automatically added to both `standalone` and `modelservice` pods * Added a selector for the `python` implementation of `kubectl_get` * Added a `python` implementation of `kubectl_delete` * Extra command line parameters required when `VLLM_ENABLE_SLEEP_MODE` is set to `true` are now rendered inside `add_command_line_options` * Fix for `get_model_name_from_pod`. Get kubeconfig from current context. * More informative error messages when `run.sh` cannot locate a stack * Removed the environment variable `LLMDBENCH_HARNESS_CONTAINER_IMAGE` Signed-off-by: maugustosilva <[email protected]>

maugustosilva requested review from kalantar and manoelmarques December 5, 2025 22:04

kalantar mentioned this pull request Dec 6, 2025

wide-ep scenario #555

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Standup] Serve models from read-only pvcs #553

[Standup] Serve models from read-only pvcs #553

Uh oh!

maugustosilva commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Standup] Serve models from read-only pvcs #553

Are you sure you want to change the base?

[Standup] Serve models from read-only pvcs #553

Uh oh!

Conversation

maugustosilva commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant