[Bug]: qwen3-30b-a3b on ovms, works on CPU, crashs with out of memory on iGPU

### OpenVINO Version

2025.4.1, 2026.0

### Operating System

Other (Please specify in description)

### Device used for inference

GPU

### Framework

None

### Model used

qwen3-30b-a3b

### Issue description

On Intel Ultra 7 265K with 32GB RAM + 128GB swap file, Kbuntu 25.10.

The model works well on the CPU with about 16GB RAM usage. When running on the iGPU, memory usage goes up until it fills the entire 32GB and then everything is killed.

The buffer length was reduced to 2048 but it didn't help.

The model is on the list of "AI Models verified for OpenVINO".

### Step-by-step reproduction

1. Exported from optimum-cli

2. On CPU it worked:
```
LD_PRELOAD="/opt/openvino/ovms/lib/libopenvino_tokenizers.so \
/opt/openvino/ovms/lib/libopenvino_genai.so" \
/opt/openvino/ovms/bin/ovms \
  --model_repository_path /opt/openvino/models \
  --model_name Qwen3-30B-int4 \
  --task text_generation \
  --port 9001 \
  --rest_port 8000 \
  --target_device CPU
```

3. On GPU it crashed:
```
LD_PRELOAD="/opt/openvino/ovms/lib/libopenvino_tokenizers.so \
/opt/openvino/ovms/lib/libopenvino_genai.so" \
/opt/openvino/ovms/bin/ovms \
  --model_repository_path /opt/openvino/models \
  --model_name Qwen3-30B-int4 \
  --task text_generation \
  --port 9001 \
  --rest_port 8000 \
  --target_device GPU
```

### Relevant log output

```shell

```

### Issue submission checklist

- [x] I'm reporting an issue. It's not a question.
- [x] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- [x] There is reproducer code and related data files such as images, videos, models, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: qwen3-30b-a3b on ovms, works on CPU, crashs with out of memory on iGPU #34187

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Relevant log output

Issue submission checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: qwen3-30b-a3b on ovms, works on CPU, crashs with out of memory on iGPU #34187

Description

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Relevant log output

Issue submission checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions