[Test] Add precision test cases for Qwen3-Omni-30B-A3B-Instruct in CI by yenuo26 · Pull Request #828 · vllm-project/vllm-omni

yenuo26 · 2026-01-17T12:14:13Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR aims to add CI tests for the precision test cases of Qwen3-Omni-30B-A3B-Instruct.
design and plan, please refer to the #400
After the modifications, the total execution time for the two Qwen3-omni online test cases is 7 minutes.

Test Plan

pytest -sv test_qwen3_omni.py --html=report.html --self-contained-html --capture=sys

Test Result

CI Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 769df694b1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

tests/e2e/online_serving/test_qwen3_omni.py

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

hsliuustc0106 · 2026-01-17T13:22:47Z

tests/e2e/online_serving/test_qwen3_omni.py

+            audio_content = convert_audio_to_text(audio_data)
+            print(f"text content is: {text_content}")
+            print(f"audio content is: {audio_content}")
+            assert cosine_similarity_text(audio_content.lower(), text_content.lower()) > 0.9, (


Text input scenario similarity: 1
Audio input scenario average similarity: 0.9425
Image input scenario average similarity: 0.9870
Video input scenario average similarity: 0.9655
Audio truncation error scenario average similarity: 0.6484

Considering factors such as errors in Whisper model recognition, the preset threshold is set to 0.9.
If audio truncation involves only a few tokens—for example, only the last character is truncated—it may result in a similarity score greater than 0.9. It is recommended to address such missed detection errors by adding a method to compare the last few characters in a subsequent PR.

please add to your test RFC #400

tests/e2e/online_serving/test_qwen3_omni.py

hsliuustc0106 · 2026-01-17T14:00:28Z

amd-ci failed

congw729 · 2026-01-19T02:38:13Z

@tjtanaa Hi, TJian. Could you help review the failed case in the AMD buildkite test? We are planning to add the precision verification tests to Qwen3-Omni-30B-A3B-Instruct. But one case shows a strange error.

tjtanaa · 2026-01-19T07:27:20Z

ok let me check

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

yenuo26 · 2026-01-19T09:59:54Z

amd-ci failed

Currently, there is an issue with garbled output on AMD machines causing AMD-CI failures. I have modified the test-amd configuration to temporarily skip this test case in AMD environments. This case will be re-enabled once the garbled output issue is resolved.

tjtanaa · 2026-01-19T15:39:53Z

.buildkite/test-amd.yaml

    - export MIOPEN_DEBUG_CONV_DIRECT=0
    - export MIOPEN_DEBUG_CONV_GEMM=0
-    - pytest -s -v tests/e2e/offline_inference/test_qwen3_omni.py tests/e2e/online_serving/test_qwen3_omni.py
+    - pytest -s -v tests/e2e/offline_inference/test_qwen3_omni.py tests/e2e/online_serving/test_qwen3_omni.py::test_video_to_audio_concurrent


@yenuo26 can you do this in the tests/e2e/online_serving/test_qwen3_omni.py instead.
add the decorator @skipif() to the test that is failing?

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

tjtanaa

@yenuo26 LGTM. Thanks for the update. I am still looking into the issue. Will fix it in upcoming PR.

yenuo26 · 2026-01-20T02:13:55Z

@yenuo26 LGTM. Thanks for the update. I am still looking into the issue. Will fix it in upcoming PR.

OKOK，thanks. Once the PR is submitted, you can associate it with this issue：#846

tjtanaa · 2026-01-20T03:08:55Z

@yenuo26 can you add [ROCm] to the issue title?

yenuo26 · 2026-01-20T06:49:00Z

@yenuo26 can you add [ROCm] to the issue title?
Added

Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>

hsliuustc0106 · 2026-01-20T23:31:49Z

fix precommits

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

yenuo26 · 2026-01-21T02:01:10Z

fix precommits

fixed

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

hsliuustc0106 · 2026-01-22T13:34:09Z

fix ci or retest it again? I have a question: how many omni-servers have been launched for the omni model test H100 workflow、

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

hsliuustc0106 · 2026-01-23T00:40:34Z

tests/e2e/online_serving/test_qwen3_omni.py

+
+    # Verify text output success
+    assert text_content is not None and len(text_content) >= 2, "No text output is generated"
+    assert "square" in text_content.lower(), "The output do not contain keywords."


(Worker_TP0 pid=12753) [Stage-0] INFO 01-22 16:33:25 [multiproc_executor.py:707] Parent process exited, terminating worker

(Worker_TP1 pid=12754) [Stage-0] INFO 01-22 16:33:25 [multiproc_executor.py:707] Parent process exited, terminating worker

[Stage-0] INFO 01-22 16:33:27 [omni_stage.py:1498] Stage worker exiting

PASSED

=========================================================================== FAILURES ===========================================================================

___________________________________________________________ test_mix_to_text_audio_001[omni_server0] ___________________________________________________________

client = <openai.OpenAI object at 0x79226172b860>, omni_server = <tests.conftest.OmniServer object at 0x79226172bda0>

@pytest.mark.skipif(is_rocm(), reason="Test skipped on AMD environment due to known output issues")

@pytest.mark.parametrize("omni_server", test_params, indirect=True)

def test_mix_to_text_audio_001(client: openai.OpenAI, omni_server) -> None:

"""

Test multi-modal input processing and text/audio output generation via OpenAI API.

Deploy Setting: default yaml

Input Modal: text + audio + video + image

Output Modal: text + audio

Input Setting: stream=True

Datasets: single request

"""

Test single completion

e2e_list = list()

video_data_url = f"data:video/mp4;base64,{generate_synthetic_video(224, 224, 300)['base64']}"

image_data_url = f"data:image/jpeg;base64,{generate_synthetic_image(224, 224)['base64']}"

audio_data_url = f"data:audio/wav;base64,{generate_synthetic_audio(5, 1)['base64']}"

messages = dummy_messages_from_mix_data(

system_prompt=get_system_prompt(),

video_data_url=video_data_url,

image_data_url=image_data_url,

audio_data_url=audio_data_url,

content_text=get_prompt("mix"),

)

Test single completion

start_time = time.perf_counter()

chat_completion = client.chat.completions.create(model=omni_server.model, messages=messages, stream=True)

text_content = ""

audio_data = None

for chunk in chat_completion:

for choice in chunk.choices:

if hasattr(choice, "delta"):

content = getattr(choice.delta, "content", None)

else:

content = None

modality = getattr(chunk, "modality", None)

if modality == "audio" and content:

Audio chunk - content

if audio_data is None:

audio_data = content

else:

audio_data += content

elif modality == "text" and content:

Text chunk - accumulate text content

text_content += content if content else ""

Verify E2E

current_e2e = time.perf_counter() - start_time

print(f"the request e2e is: {current_e2e}")

TODO: Verify the E2E latency after confirmation baseline.

e2e_list.append(current_e2e)

print(f"the avg e2e is: {sum(e2e_list) / len(e2e_list)}")

Verify all completions succeeded

assert audio_data is not None, "No audio output is generated"

Verify text output success

assert text_content is not None and len(text_content) >= 2, "No text output is generated"

assert "square" in text_content.lower(), "The output do not contain keywords."

E AssertionError: The output do not contain keywords.

E assert 'square' in 'the audio contains the sound of flowing water.\n\nthe image displays five colored spheres against a black background:\n* a yellow sphere.\n* a green sphere.\n* a purple sphere.\n* two brown spheres.\n\nthese spheres move around the screen, sometimes overlapping with each other.'

E + where 'the audio contains the sound of flowing water.\n\nthe image displays five colored spheres against a black background:\n* a yellow sphere.\n* a green sphere.\n* a purple sphere.\n* two brown spheres.\n\nthese spheres move around the screen, sometimes overlapping with each other.' = <built-in method lower of str object at 0x7927a613ee70>()

E + where <built-in method lower of str object at 0x7927a613ee70> = 'The audio contains the sound of flowing water.\n\nThe image displays five colored spheres against a black background:\n* A yellow sphere.\n* A green sphere.\n* A purple sphere.\n* Two brown spheres.\n\nThese spheres move around the screen, sometimes overlapping with each other.'.lower

https://buildkite.com/vllm/vllm-omni/builds/1972/steps/canvas?sid=019be832-5d05-4f7a-84e7-af1175337e9e

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

tests/e2e/online_serving/test_qwen3_omni.py

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

hsliuustc0106 · 2026-01-23T02:11:35Z

try the ci for 5 times

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

wangyu31577 and others added 3 commits January 17, 2026 18:44

add ci single text test

a43cb93

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

add omni client for new test

6ad7dcc

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

Merge branch 'main' into ci

769df69

yenuo26 mentioned this pull request Jan 17, 2026

[RFC]: vllm-omni CI/CD plan #400

Open

1 task

chatgpt-codex-connector bot reviewed Jan 17, 2026

View reviewed changes

tests/e2e/online_serving/test_qwen3_omni.py Outdated Show resolved Hide resolved

wangyu31577 added 2 commits January 17, 2026 20:20

add omni client for new test

ec38fea

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

Merge branch 'ci' of https://github.com/yenuo26/vllm-omni into ci

05fc555

hsliuustc0106 linked an issue Jan 17, 2026 that may be closed by this pull request

[RFC]: vllm-omni CI/CD plan #400

Open

1 task

hsliuustc0106 reviewed Jan 17, 2026

View reviewed changes

hsliuustc0106 added the ready label to trigger buildkite CI label Jan 17, 2026

yenuo26 mentioned this pull request Jan 19, 2026

[RFC]: Qwen-Omni Test Cases JiusiServe/vllm-omni#14

Open

1 task

david6666666 added this to the v0.14.0rc1 milestone Jan 19, 2026

wangyu31577 added 2 commits January 19, 2026 17:20

In the AMD scenario, shield the newly added test cases.

52e999f

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

print similarity

c6393cf

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

tjtanaa suggested changes Jan 19, 2026

View reviewed changes

add skip

6606aad

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

tjtanaa approved these changes Jan 20, 2026

View reviewed changes

yenuo26 requested a review from hsliuustc0106 January 20, 2026 02:17

Merge branch 'main' into ci

44cd276

Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>

yenuo26 and others added 2 commits January 21, 2026 09:54

Merge branch 'main' into ci

a536915

fix pre-commit

571eb40

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

Modify test cases

e90b1bb

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

david6666666 added the ready label to trigger buildkite CI label Jan 22, 2026

Modify the test case execution order

47f1b28

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

wangyu31577 and others added 8 commits January 22, 2026 21:55

add gpu clean

ae582c9

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

modify fixture scope

4d5d12a

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

add gpu clean for diffusion test

3b57788

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

add server clean for diffusion test

9840864

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

Merge branch 'ci' of https://github.com/yenuo26/vllm-omni into ci

f7dee2d

add server clean for diffusion test

ca07eae

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

Merge branch 'main' into ci

7890795

Merge branch 'main' into ci

041dad8

hsliuustc0106 reviewed Jan 23, 2026

View reviewed changes

wangyu31577 and others added 3 commits January 23, 2026 09:22

Increase the stability of keyword validation

9d046b3

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

Merge branch 'ci' of https://github.com/yenuo26/vllm-omni into ci

513b5cc

Merge branch 'main' into ci

1a9a43b

david6666666 modified the milestones: v0.14.0rc1, v0.14.0 Jan 23, 2026

tjtanaa reviewed Jan 23, 2026

View reviewed changes

tests/e2e/online_serving/test_qwen3_omni.py Show resolved Hide resolved

wangyu31577 added 2 commits January 23, 2026 10:00

retry CI

eb7a79e

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

Merge branch 'ci' of https://github.com/yenuo26/vllm-omni into ci

c766b6b

wangyu31577 and others added 5 commits January 23, 2026 10:28

retry CI

c0b6402

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

retry CI

48a4325

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

retry ci

4820f02

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

Merge branch 'main' into ci

aa2e39a

Merge branch 'main' into ci

4e99c31

hsliuustc0106 approved these changes Jan 23, 2026

View reviewed changes

hsliuustc0106 merged commit 34c9d8f into vllm-project:main Jan 23, 2026
7 checks passed

yenuo26 deleted the ci branch January 23, 2026 10:32

Conversation

yenuo26 commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

CI Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

hsliuustc0106 Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

yenuo26 Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

yenuo26 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hsliuustc0106 commented Jan 17, 2026

Uh oh!

congw729 commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjtanaa commented Jan 19, 2026

Uh oh!

yenuo26 commented Jan 19, 2026

Uh oh!

tjtanaa Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

yenuo26 commented Jan 20, 2026

Uh oh!

tjtanaa commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yenuo26 commented Jan 20, 2026

Uh oh!

hsliuustc0106 commented Jan 20, 2026

Uh oh!

yenuo26 commented Jan 21, 2026

Uh oh!

hsliuustc0106 commented Jan 22, 2026

Uh oh!

hsliuustc0106 Jan 23, 2026

Choose a reason for hiding this comment

Test single completion

Test single completion

Audio chunk - content

Text chunk - accumulate text content

Verify E2E

TODO: Verify the E2E latency after confirmation baseline.

Verify all completions succeeded

Verify text output success

Uh oh!

hsliuustc0106 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hsliuustc0106 commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yenuo26 commented Jan 17, 2026 •

edited

Loading

congw729 commented Jan 19, 2026 •

edited

Loading

tjtanaa commented Jan 20, 2026 •

edited

Loading