Skip to content

Commit 3d134d2

Browse files
Fix the vLLM docker compose issues (#134)
* refine the vLLM docker compose Signed-off-by: tianyil1 <tianyi.liu@intel.com> * update the vllm openai api call Signed-off-by: tianyil1 <tianyi.liu@intel.com> * refine the default network configuration in the docker-compose Signed-off-by: tianyil1 <tianyi.liu@intel.com> * refine the network config of docker compose and launch service Signed-off-by: tianyil1 <tianyi.liu@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: tianyil1 <tianyi.liu@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 1f827d4 commit 3d134d2

File tree

3 files changed

+7
-4
lines changed

3 files changed

+7
-4
lines changed

comps/llms/text-generation/vllm/docker_compose_llm.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,10 @@ services:
1414
environment:
1515
http_proxy: ${http_proxy}
1616
https_proxy: ${https_proxy}
17+
no_proxy: ${no_proxy}
1718
LLM_MODEL_ID: ${LLM_MODEL_ID}
1819
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
19-
command: cd / && export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --model $LLM_MODEL_ID --port 80
20+
command: /bin/sh -c "cd / && export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --model $LLM_MODEL_ID --port 80"
2021
llm:
2122
image: opea/gen-ai-comps:llm-vllm-server
2223
container_name: llm-vllm-server
@@ -26,6 +27,7 @@ services:
2627
environment:
2728
http_proxy: ${http_proxy}
2829
https_proxy: ${https_proxy}
30+
no_proxy: ${no_proxy}
2931
vLLM_LLM_ENDPOINT: ${vLLM_LLM_ENDPOINT}
3032
LLM_MODEL_ID: ${LLM_MODEL_ID}
3133
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}

comps/llms/text-generation/vllm/launch_vllm_service.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,4 @@ fi
2222
volume=$PWD/data
2323

2424
# Build the Docker run command based on the number of cards
25-
docker run -it --rm --name="ChatQnA_server" -p $port_number:$port_number --network=host -v $volume:/data -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} vllm:cpu /bin/bash -c "cd / && export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --model $model_name --port $port_number"
25+
docker run -it --rm --name="ChatQnA_server" -p $port_number:$port_number --network=host -v $volume:/data -e HTTPS_PROXY=$https_proxy -e HTTP_PROXY=$https_proxy -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} vllm:cpu /bin/bash -c "cd / && export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --model $model_name --host 0.0.0.0 --port $port_number"

comps/llms/text-generation/vllm/llm.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
from fastapi.responses import StreamingResponse
77
from langchain_community.llms import VLLMOpenAI
8+
from langsmith import traceable
89

910
from comps import GeneratedDoc, LLMParamsDoc, ServiceType, opea_microservices, opea_telemetry, register_microservice
1011

@@ -28,12 +29,12 @@ def post_process_text(text: str):
2829
host="0.0.0.0",
2930
port=9000,
3031
)
31-
@opea_telemetry
32+
@traceable(run_type="llm")
3233
def llm_generate(input: LLMParamsDoc):
3334
llm_endpoint = os.getenv("vLLM_LLM_ENDPOINT", "http://localhost:8080")
3435
llm = VLLMOpenAI(
3536
openai_api_key="EMPTY",
36-
endpoint_url=llm_endpoint + "/v1",
37+
openai_api_base=llm_endpoint + "/v1",
3738
max_tokens=input.max_new_tokens,
3839
model_name=os.getenv("LLM_MODEL_ID", "meta-llama/Meta-Llama-3-8B-Instruct"),
3940
top_p=input.top_p,

0 commit comments

Comments
 (0)