Add parameters for Qwen2.5-vl-7b-instruct model by prernanookala-ai · Pull Request #47 · opea-project/Enterprise-Inference

prernanookala-ai · 2026-01-29T21:25:38Z

No description provided.

sgurunat · 2026-01-30T07:29:39Z

core/helm-charts/vllm/xeon-values.yaml

    tensor_parallel_size: "{{ .Values.tensor_parallel_size }}"
    pipeline_parallel_size: "{{ .Values.pipeline_parallel_size }}"

+  "Qwen/Qwen2.5-VL-7B-Instruct":


Thanks for adding the support for this VL model. For better performance in the xeon include these additional variables and extra command arguments. Also tensor parallel is calculated dynamically based on the system configuration where models are deployed.

configMapValues:
VLLM_CPU_KVCACHE_SPACE: "40"
VLLM_RPC_TIMEOUT: "100000"
VLLM_ALLOW_LONG_MAX_MODEL_LEN: "1"
VLLM_ENGINE_ITERATION_TIMEOUT_S: "120"
VLLM_CPU_NUM_OF_RESERVED_CPU: "0"
VLLM_CPU_SGL_KERNEL: "1"
HF_HUB_DISABLE_XET: "1"
extraCmdArgs:
[
"--block-size",
"128",
"--dtype",
"bfloat16",
"--distributed_executor_backend",
"mp",
"--enable_chunked_prefill",
"--enforce-eager",
"--max-model-len",
"33024",
"--max-num-batched-tokens",
"2048",
"--max-num-seqs",
"256",
]
tensor_parallel_size: "{{ .Values.tensor_parallel_size }}"
pipeline_parallel_size: "{{ .Values.pipeline_parallel_size }}"

Thanks for the suggestions!
I’ve updated xeon-values.yaml to include the additional configMap values and extra command arguments as suggested.
Please let me know if anything else needs adjustment.

prernanookala-ai · 2026-01-31T00:09:08Z

Updated the PR per the review comments. Ready for another look.

Signed-off-by: prernanookala-ai <prerna.nookala@cloud2labs.com>

prernanookala-ai · 2026-02-11T03:22:32Z

Updated commit author and signed-off-by email to correct company domain. No code changes.

zahidulhaque · 2026-03-24T04:59:28Z

core/helm-charts/vllm/xeon-values.yaml

+      VLLM_ALLOW_LONG_MAX_MODEL_LEN: "1"
+      VLLM_ENGINE_ITERATION_TIMEOUT_S: "120"
+      VLLM_CPU_NUM_OF_RESERVED_CPU: "0"
+      VLLM_CPU_SGL_KERNEL: "1"


@prernanookala-ai, Have you tried running Qwen2.5-vl-7b-instruct with this patch? When I used these settings before, the model either failed to start or the server crashed on a /chat/completions request.

For testing you can use the curl command:

curl -X POST "http:///v1/chat/completions"
-H "Content-Type: application/json"
--data '{
"model": "Qwen/Qwen2.5-VL-7B-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'

@zahidulhaque - You're right, this path is not working for me either.

Below are the values, I've tested and found to be stable for Qwen2,5VL-7B-Instruct, setting VLLM_CPU_KVCACHE_SPACE to 16 and disabling triton resolved the issue on my end.

"Qwen/Qwen2.5-VL-7B-Instruct":
configMapValues:
VLLM_SKIP_WARMUP: "true"
VLLM_CPU_KVCACHE_SPACE: "16"
VLLM_DISABLE_TRITON: "1"
extraCmdArgs: ["--max-model-len","8192"]
tensor_parallel_size: "1"

Let me know if I can update PR with these values?

sure @HarikaDev296 , If things are working fine with the above configuration, you can go ahead and update the code.

also VLLM_CPU_KVCACHE_SPACE: "16" might be too less for mutimodal models. Try setting it to atleast 40. Also make sure to test with curl command once the server is up.

@zahidulhaque - I test Qwen model with below config, increased VLLM_CPU_KVCACHE_SPACE to 40 and was successfully able to get inference.
"Qwen/Qwen2.5-VL-7B-Instruct":
configMapValues:
VLLM_SKIP_WARMUP: "true"
VLLM_CPU_KVCACHE_SPACE: "40"
VLLM_DISABLE_TRITON: "1"
extraCmdArgs: ["--max-model-len","8192"]
tensor_parallel_size: "1"

alexsin368 requested a review from AhmedSeemalK January 29, 2026 22:58

sgurunat reviewed Jan 30, 2026

View reviewed changes

prernanookala-ai force-pushed the feature/prerna-model-config branch from af44a36 to 17173c5 Compare January 31, 2026 00:10

prernanookala-ai force-pushed the feature/prerna-model-config branch from 17173c5 to bd1f8d0 Compare February 11, 2026 02:41

prernanookala-ai added 2 commits February 10, 2026 21:18

Add parameters for Qwen2.5-vl-7b-instruct model

9e1a8c3

Signed-off-by: prernanookala-ai <prerna.nookala@cloud2labs.com>

Update parameters for Qwen2.5-VL per review

8c889dd

Signed-off-by: prernanookala-ai <prerna.nookala@cloud2labs.com>

prernanookala-ai force-pushed the feature/prerna-model-config branch from bd1f8d0 to 8c889dd Compare February 11, 2026 03:19

zahidulhaque reviewed Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parameters for Qwen2.5-vl-7b-instruct model#47

Add parameters for Qwen2.5-vl-7b-instruct model#47
prernanookala-ai wants to merge 2 commits intoopea-project:mainfrom
cld2labs:feature/prerna-model-config

prernanookala-ai commented Jan 29, 2026

Uh oh!

sgurunat Jan 30, 2026

Uh oh!

prernanookala-ai Jan 31, 2026

Uh oh!

prernanookala-ai commented Jan 31, 2026

Uh oh!

prernanookala-ai commented Feb 11, 2026

Uh oh!

zahidulhaque Mar 24, 2026

Uh oh!

HarikaDev296 Mar 26, 2026

Uh oh!

zahidulhaque Mar 27, 2026

Uh oh!

zahidulhaque Mar 27, 2026

Uh oh!

HarikaDev296 Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

prernanookala-ai commented Jan 29, 2026

Uh oh!

sgurunat Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

prernanookala-ai Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

prernanookala-ai commented Jan 31, 2026

Uh oh!

prernanookala-ai commented Feb 11, 2026

Uh oh!

zahidulhaque Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

HarikaDev296 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

zahidulhaque Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

zahidulhaque Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

HarikaDev296 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants