From 8d9023393dd6e7eec39c346af61af2e88fdda84b Mon Sep 17 00:00:00 2001 From: Iacopo Poli Date: Tue, 4 Mar 2025 16:39:14 +0100 Subject: [PATCH 1/2] nginx guide: remove privileged from vllm container run and target device ID Signed-off-by: Iacopo Poli --- docs/source/deployment/nginx.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/deployment/nginx.md b/docs/source/deployment/nginx.md index 87feb4885685..b750deb78f93 100644 --- a/docs/source/deployment/nginx.md +++ b/docs/source/deployment/nginx.md @@ -101,8 +101,8 @@ Notes: ```console mkdir -p ~/.cache/huggingface/hub/ hf_cache_dir=~/.cache/huggingface/ -docker run -itd --ipc host --privileged --network vllm_nginx --gpus all --shm-size=10.24gb -v $hf_cache_dir:/root/.cache/huggingface/ -p 8081:8000 --name vllm0 vllm --model meta-llama/Llama-2-7b-chat-hf -docker run -itd --ipc host --privileged --network vllm_nginx --gpus all --shm-size=10.24gb -v $hf_cache_dir:/root/.cache/huggingface/ -p 8082:8000 --name vllm1 vllm --model meta-llama/Llama-2-7b-chat-hf +docker run -itd --ipc host --network vllm_nginx --gpus device=0 --shm-size=10.24gb -v $hf_cache_dir:/root/.cache/huggingface/ -p 8081:8000 --name vllm0 vllm --model meta-llama/Llama-2-7b-chat-hf +docker run -itd --ipc host --network vllm_nginx --gpus device=1 --shm-size=10.24gb -v $hf_cache_dir:/root/.cache/huggingface/ -p 8082:8000 --name vllm1 vllm --model meta-llama/Llama-2-7b-chat-hf ``` :::{note} From a07088483c5ea8e924556cb73d78d383f338ad68 Mon Sep 17 00:00:00 2001 From: Iacopo Poli Date: Wed, 5 Mar 2025 10:47:29 +0100 Subject: [PATCH 2/2] update instruction for cpu backend Signed-off-by: Iacopo Poli --- docs/source/deployment/nginx.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/deployment/nginx.md b/docs/source/deployment/nginx.md index b750deb78f93..62816f514c00 100644 --- a/docs/source/deployment/nginx.md +++ b/docs/source/deployment/nginx.md @@ -95,7 +95,7 @@ Notes: - If you have your HuggingFace models cached somewhere else, update `hf_cache_dir` below. - If you don't have an existing HuggingFace cache you will want to start `vllm0` and wait for the model to complete downloading and the server to be ready. This will ensure that `vllm1` can leverage the model you just downloaded and it won't have to be downloaded again. -- The below example assumes GPU backend used. If you are using CPU backend, remove `--gpus all`, add `VLLM_CPU_KVCACHE_SPACE` and `VLLM_CPU_OMP_THREADS_BIND` environment variables to the docker run command. +- The below example assumes GPU backend used. If you are using CPU backend, remove `--gpus device=ID`, add `VLLM_CPU_KVCACHE_SPACE` and `VLLM_CPU_OMP_THREADS_BIND` environment variables to the docker run command. - Adjust the model name that you want to use in your vLLM servers if you don't want to use `Llama-2-7b-chat-hf`. ```console