Skip to content

[Bug]: qwen3-30b-a3b on ovms, works on CPU, crashs with out of memory on iGPU #34187

@ikirsh

Description

@ikirsh

OpenVINO Version

2025.4.1, 2026.0

Operating System

Other (Please specify in description)

Device used for inference

GPU

Framework

None

Model used

qwen3-30b-a3b

Issue description

On Intel Ultra 7 265K with 32GB RAM + 128GB swap file, Kbuntu 25.10.

The model works well on the CPU with about 16GB RAM usage. When running on the iGPU, memory usage goes up until it fills the entire 32GB and then everything is killed.

The buffer length was reduced to 2048 but it didn't help.

The model is on the list of "AI Models verified for OpenVINO".

Step-by-step reproduction

  1. Exported from optimum-cli

  2. On CPU it worked:

LD_PRELOAD="/opt/openvino/ovms/lib/libopenvino_tokenizers.so \
/opt/openvino/ovms/lib/libopenvino_genai.so" \
/opt/openvino/ovms/bin/ovms \
  --model_repository_path /opt/openvino/models \
  --model_name Qwen3-30B-int4 \
  --task text_generation \
  --port 9001 \
  --rest_port 8000 \
  --target_device CPU
  1. On GPU it crashed:
LD_PRELOAD="/opt/openvino/ovms/lib/libopenvino_tokenizers.so \
/opt/openvino/ovms/lib/libopenvino_genai.so" \
/opt/openvino/ovms/bin/ovms \
  --model_repository_path /opt/openvino/models \
  --model_name Qwen3-30B-int4 \
  --task text_generation \
  --port 9001 \
  --rest_port 8000 \
  --target_device GPU

Relevant log output

Issue submission checklist

  • I'm reporting an issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions