Skip to content

[Test] Add precision test cases for Qwen3-Omni-30B-A3B-Instruct in CI#828

Merged
hsliuustc0106 merged 36 commits intovllm-project:mainfrom
yenuo26:ci
Jan 23, 2026
Merged

[Test] Add precision test cases for Qwen3-Omni-30B-A3B-Instruct in CI#828
hsliuustc0106 merged 36 commits intovllm-project:mainfrom
yenuo26:ci

Conversation

@yenuo26
Copy link
Contributor

@yenuo26 yenuo26 commented Jan 17, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR aims to add CI tests for the precision test cases of Qwen3-Omni-30B-A3B-Instruct.
design and plan, please refer to the #400
After the modifications, the total execution time for the two Qwen3-omni online test cases is 7 minutes.

Test Plan

pytest -sv test_qwen3_omni.py --html=report.html --self-contained-html --capture=sys

Test Result

image

CI Result

image image image image image image
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

wangyu31577 and others added 3 commits January 17, 2026 18:44
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
@yenuo26 yenuo26 mentioned this pull request Jan 17, 2026
1 task
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 769df694b1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@hsliuustc0106 hsliuustc0106 linked an issue Jan 17, 2026 that may be closed by this pull request
1 task
audio_content = convert_audio_to_text(audio_data)
print(f"text content is: {text_content}")
print(f"audio content is: {audio_content}")
assert cosine_similarity_text(audio_content.lower(), text_content.lower()) > 0.9, (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 0.9?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text input scenario similarity: 1
Audio input scenario average similarity: 0.9425
Image input scenario average similarity: 0.9870
Video input scenario average similarity: 0.9655
Audio truncation error scenario average similarity: 0.6484

Considering factors such as errors in Whisper model recognition, the preset threshold is set to 0.9.
If audio truncation involves only a few tokens—for example, only the last character is truncated—it may result in a similarity score greater than 0.9. It is recommended to address such missed detection errors by adding a method to compare the last few characters in a subsequent PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add to your test RFC #400

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Jan 17, 2026
@hsliuustc0106
Copy link
Collaborator

amd-ci failed

@congw729
Copy link
Contributor

congw729 commented Jan 19, 2026

@tjtanaa Hi, TJian. Could you help review the failed case in the AMD buildkite test? We are planning to add the precision verification tests to Qwen3-Omni-30B-A3B-Instruct. But one case shows a strange error.
image

@tjtanaa
Copy link
Contributor

tjtanaa commented Jan 19, 2026

ok let me check

wangyu31577 added 2 commits January 19, 2026 17:20
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
@yenuo26
Copy link
Contributor Author

yenuo26 commented Jan 19, 2026

amd-ci failed

Currently, there is an issue with garbled output on AMD machines causing AMD-CI failures. I have modified the test-amd configuration to temporarily skip this test case in AMD environments. This case will be re-enabled once the garbled output issue is resolved.

- export MIOPEN_DEBUG_CONV_DIRECT=0
- export MIOPEN_DEBUG_CONV_GEMM=0
- pytest -s -v tests/e2e/offline_inference/test_qwen3_omni.py tests/e2e/online_serving/test_qwen3_omni.py
- pytest -s -v tests/e2e/offline_inference/test_qwen3_omni.py tests/e2e/online_serving/test_qwen3_omni.py::test_video_to_audio_concurrent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yenuo26 can you do this in the tests/e2e/online_serving/test_qwen3_omni.py instead.
add the decorator @skipif() to the test that is failing?

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Copy link
Contributor

@tjtanaa tjtanaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yenuo26 LGTM. Thanks for the update. I am still looking into the issue. Will fix it in upcoming PR.

@yenuo26
Copy link
Contributor Author

yenuo26 commented Jan 20, 2026

@yenuo26 LGTM. Thanks for the update. I am still looking into the issue. Will fix it in upcoming PR.

OKOK,thanks. Once the PR is submitted, you can associate it with this issue:#846

@yenuo26 yenuo26 requested a review from hsliuustc0106 January 20, 2026 02:17
@tjtanaa
Copy link
Contributor

tjtanaa commented Jan 20, 2026

@yenuo26 can you add [ROCm] to the issue title?

@yenuo26
Copy link
Contributor Author

yenuo26 commented Jan 20, 2026

@yenuo26 can you add [ROCm] to the issue title?
Added

Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
@hsliuustc0106
Copy link
Collaborator

fix precommits

yenuo26 and others added 2 commits January 21, 2026 09:54
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
@yenuo26
Copy link
Contributor Author

yenuo26 commented Jan 21, 2026

fix precommits

fixed

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
@david6666666 david6666666 added the ready label to trigger buildkite CI label Jan 22, 2026
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
@hsliuustc0106
Copy link
Collaborator

fix ci or retest it again? I have a question: how many omni-servers have been launched for the omni model test H100 workflow、

wangyu31577 and others added 8 commits January 22, 2026 21:55
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

# Verify text output success
assert text_content is not None and len(text_content) >= 2, "No text output is generated"
assert "square" in text_content.lower(), "The output do not contain keywords."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Worker_TP0 pid=12753) [Stage-0] INFO 01-22 16:33:25 [multiproc_executor.py:707] Parent process exited, terminating worker

(Worker_TP1 pid=12754) [Stage-0] INFO 01-22 16:33:25 [multiproc_executor.py:707] Parent process exited, terminating worker

[Stage-0] INFO 01-22 16:33:27 [omni_stage.py:1498] Stage worker exiting

PASSED

=========================================================================== FAILURES ===========================================================================

___________________________________________________________ test_mix_to_text_audio_001[omni_server0] ___________________________________________________________

client = <openai.OpenAI object at 0x79226172b860>, omni_server = <tests.conftest.OmniServer object at 0x79226172bda0>

@pytest.mark.skipif(is_rocm(), reason="Test skipped on AMD environment due to known output issues")

@pytest.mark.parametrize("omni_server", test_params, indirect=True)

def test_mix_to_text_audio_001(client: openai.OpenAI, omni_server) -> None:

"""

Test multi-modal input processing and text/audio output generation via OpenAI API.

Deploy Setting: default yaml

Input Modal: text + audio + video + image

Output Modal: text + audio

Input Setting: stream=True

Datasets: single request

"""

Test single completion

e2e_list = list()

video_data_url = f"data:video/mp4;base64,{generate_synthetic_video(224, 224, 300)['base64']}"

image_data_url = f"data:image/jpeg;base64,{generate_synthetic_image(224, 224)['base64']}"

audio_data_url = f"data:audio/wav;base64,{generate_synthetic_audio(5, 1)['base64']}"

messages = dummy_messages_from_mix_data(

system_prompt=get_system_prompt(),

video_data_url=video_data_url,

image_data_url=image_data_url,

audio_data_url=audio_data_url,

content_text=get_prompt("mix"),

)

Test single completion

start_time = time.perf_counter()

chat_completion = client.chat.completions.create(model=omni_server.model, messages=messages, stream=True)

text_content = ""

audio_data = None

for chunk in chat_completion:

for choice in chunk.choices:

if hasattr(choice, "delta"):

content = getattr(choice.delta, "content", None)

else:

content = None

modality = getattr(chunk, "modality", None)

if modality == "audio" and content:

Audio chunk - content

if audio_data is None:

audio_data = content

else:

audio_data += content

elif modality == "text" and content:

Text chunk - accumulate text content

text_content += content if content else ""

Verify E2E

current_e2e = time.perf_counter() - start_time

print(f"the request e2e is: {current_e2e}")

TODO: Verify the E2E latency after confirmation baseline.

e2e_list.append(current_e2e)

print(f"the avg e2e is: {sum(e2e_list) / len(e2e_list)}")

Verify all completions succeeded

assert audio_data is not None, "No audio output is generated"

Verify text output success

assert text_content is not None and len(text_content) >= 2, "No text output is generated"

assert "square" in text_content.lower(), "The output do not contain keywords."

E AssertionError: The output do not contain keywords.

E assert 'square' in 'the audio contains the sound of flowing water.\n\nthe image displays five colored spheres against a black background:\n* a yellow sphere.\n* a green sphere.\n* a purple sphere.\n* two brown spheres.\n\nthese spheres move around the screen, sometimes overlapping with each other.'

E + where 'the audio contains the sound of flowing water.\n\nthe image displays five colored spheres against a black background:\n* a yellow sphere.\n* a green sphere.\n* a purple sphere.\n* two brown spheres.\n\nthese spheres move around the screen, sometimes overlapping with each other.' = <built-in method lower of str object at 0x7927a613ee70>()

E + where <built-in method lower of str object at 0x7927a613ee70> = 'The audio contains the sound of flowing water.\n\nThe image displays five colored spheres against a black background:\n* A yellow sphere.\n* A green sphere.\n* A purple sphere.\n* Two brown spheres.\n\nThese spheres move around the screen, sometimes overlapping with each other.'.lower

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david6666666 david6666666 modified the milestones: v0.14.0rc1, v0.14.0 Jan 23, 2026
wangyu31577 added 2 commits January 23, 2026 10:00
@hsliuustc0106
Copy link
Collaborator

try the ci for 5 times

wangyu31577 and others added 5 commits January 23, 2026 10:28
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
@hsliuustc0106 hsliuustc0106 merged commit 34c9d8f into vllm-project:main Jan 23, 2026
7 checks passed
@yenuo26 yenuo26 deleted the ci branch January 23, 2026 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC]: vllm-omni CI/CD plan

5 participants