Skip to content

Conversation

@maugustosilva
Copy link
Collaborator

  • The pvc holding the models mounted by the pods are now read-only, with the environment variable VLLM_CACHE_ROOT (default value /tmp/vllm being automatically set)
  • Make variables VLLM_CACHE_ROOT, VLLM_WORKER_MULTIPROC_METHOD, VLLM_ALLOW_LONG_MAX_MODEL_LEN, VLLM_SERVER_DEV_MODE, VLLM_LOAD_FORMAT, VLLM_LOGGING_LEVEL, VLLM_ENABLE_SLEEP_MODE part of LLMDBENCH_VLLM_COMMON and automatically added to the environment (ready to be used by vllm)
  • IMPORTANT Any environment variable with the string LLMDBENCH_VLLM_COMMON_VLLM_<SOMETHING>, LLMDBENCH_VLLM_STANDALONE_VLLM_<SOMETHING>,
    LLMDBENCH_VLLM_MODELSERVICE_PREFILL_VLLM_<SOMETHING>, LLMDBENCH_VLLM_MODELSERVICE_DECODE_VLLM_<SOMETHING> will be exported within the pod as VLLM_<SOMETHING>
  • Annotations stood-up-by, stood-up-from and stood-up-via are automatically added to both standalone and modelservice pods
  • Added a selector for the python implementation of kubectl_get
  • Added a python implementation of kubectl_delete
  • Extra command line parameters required when VLLM_ENABLE_SLEEP_MODE is set to true are now rendered inside add_command_line_options
  • Fix for get_model_name_from_pod. Get kubeconfig from current context.
  • More informative error messages when run.sh cannot locate a stack
  • Removed the environment variable LLMDBENCH_HARNESS_CONTAINER_IMAGE

* The `pvc` holding the models mounted by the `pods` are now
  **read-only**, with the environment variable `VLLM_CACHE_ROOT`
(default value `/tmp/vllm` being automatically set)
* Make variables `VLLM_CACHE_ROOT`, `VLLM_WORKER_MULTIPROC_METHOD`,
  `VLLM_ALLOW_LONG_MAX_MODEL_LEN`, `VLLM_SERVER_DEV_MODE`,
`VLLM_LOAD_FORMAT`, `VLLM_LOGGING_LEVEL`, `VLLM_ENABLE_SLEEP_MODE` part
of `LLMDBENCH_VLLM_COMMON` and automatically added to the environment
(ready to be used by `vllm`)
* **IMPORTANT** Any environment variable with the string
  `LLMDBENCH_VLLM_COMMON_VLLM_<SOMETHING>`,
`LLMDBENCH_VLLM_STANDALONE_VLLM_<SOMETHING>`,
`LLMDBENCH_VLLM_MODELSERVICE_PREFILL_VLLM_<SOMETHING>`,
`LLMDBENCH_VLLM_MODELSERVICE_DECODE_VLLM_<SOMETHING>` will be exported
within the `pod` as `VLLM_<SOMETHING>`
* Annotations `stood-up-by`, `stood-up-from` and `stood-up-via` are
  automatically added to both `standalone` and `modelservice` pods
* Added a selector for the `python` implementation of `kubectl_get`
* Added a `python` implementation of `kubectl_delete`
* Extra command line parameters required when `VLLM_ENABLE_SLEEP_MODE`
  is set to `true` are now rendered inside `add_command_line_options`
* Fix for `get_model_name_from_pod`. Get kubeconfig from current
  context.
* More informative error messages when `run.sh` cannot locate a stack
* Removed the environment variable `LLMDBENCH_HARNESS_CONTAINER_IMAGE`

Signed-off-by: maugustosilva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant