[BugFix] Modify max_tokens and modify the log and fix #1103 by amy-why-3459 · Pull Request #1097 · vllm-project/vllm-omni

amy-why-3459 · 2026-01-30T03:34:27Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct 
     --omni 
     --port 8091 
     --stage-configs-path /vllm_omni/model_executor/stage_configs/qwen3_omni_moe_async_chunk.yaml

Test Result

{'type': 'request_level_metrics',
  'request_id': 'chatcmpl-9635776e7401b703',
  'e2e_time_ms': 15611.124753952026,
  'e2e_tpt': 6.048479176269673,
  'e2e_total_tokens': 2581,
  'transfers_total_time_ms': 0.0,
  'transfers_total_bytes': 0,
  'stages': {0: {'stage_gen_time_ms': 75.16026496887207,
                 'num_tokens_out': 90,
                 'num_tokens_in': 2491},
             2: {'stage_gen_time_ms': 213.80853652954102, 'num_tokens_out': 0}}}
 [Summary] {'e2e_requests': 1,
  'e2e_total_time_ms': 15611.555576324463,
  'e2e_sum_time_ms': 15611.124753952026,
  'e2e_total_tokens': 2581,
  'e2e_avg_time_per_request_ms': 15611.124753952026,
  'e2e_avg_tokens_per_s': 165.33081636841115,
  'wall_time_ms': 15611.555576324463,
  'final_stage_id': 2,
  'stages': [{'stage_id': 0,
              'requests': 1,
              'tokens': 2581,
              'total_time_ms': 10248.541831970215,
              'avg_time_per_request_ms': 10248.541831970215,
              'avg_tokens_per_s': 251.8407049819125},
             {'stage_id': 1,
              'requests': 1,
              'tokens': 328,
              'total_time_ms': 6840.2392864227295,
              'avg_time_per_request_ms': 6840.2392864227295,
              'avg_tokens_per_s': 47.95153886663746},
             {'stage_id': 2,
              'requests': 1,
              'tokens': 0,
              'total_time_ms': 6722.139120101929,
              'avg_time_per_request_ms': 6722.139120101929,
              'avg_tokens_per_s': 0.0}]}
the request e2e is: 20.947600789368153
the avg e2e is: 20.947600789368153

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

hsliuustc0106 · 2026-01-30T03:39:45Z

vllm_omni/model_executor/stage_configs/qwen3_omni_moe_async_chunk.yaml

      temperature: 0.9
      top_k: 50
-      max_tokens: 4096
+      max_tokens: 2048 # TODO: The max_tokens of the async_chunk feature cannot exceed 2048.


is this model specific?

hsliuustc0106

add test results

david6666666 · 2026-01-30T10:37:49Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d8e9d8679

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-30T10:41:35Z

vllm_omni/distributed/omni_connectors/connectors/shm_connector.py

+                lock_file = f"/dev/shm/shm_{put_key}_lockfile.lock"
+                with open(lock_file, "w") as lockf:
+                    fcntl.flock(lockf, fcntl.LOCK_EX)
+                    meta = shm_write_bytes(payload, name=put_key)
+                    fcntl.flock(lockf, fcntl.LOCK_UN)


Clean up per-chunk lock files to avoid /dev/shm leaks

This change creates a new lock file under /dev/shm for every put_key (request+stage+chunk) but there is no corresponding deletion anywhere in the connector. On long-running servers or high‑throughput workloads, these lock files will accumulate indefinitely and can exhaust /dev/shm inode/space limits, which will then cause future SHM writes/locks to fail. This only happens when many chunks are produced, but it’s a realistic production scenario for streaming workloads.

Useful? React with 👍 / 👎.

Signed-off-by: amy-why-3459 <[email protected]>

… (vllm-project#1097) Signed-off-by: amy-why-3459 <[email protected]> Co-authored-by: Hongsheng Liu <[email protected]>

amy-why-3459 requested a review from hsliuustc0106 as a code owner January 30, 2026 03:34

hsliuustc0106 reviewed Jan 30, 2026

View reviewed changes

amy-why-3459 force-pushed the bugfix_chunk branch from 6b551cf to 17bbd40 Compare January 30, 2026 03:44

hsliuustc0106 reviewed Jan 30, 2026

View reviewed changes

amy-why-3459 force-pushed the bugfix_chunk branch 2 times, most recently from 5aeb593 to c79fb40 Compare January 30, 2026 09:52

hsliuustc0106 added the ready label to trigger buildkite CI label Jan 30, 2026

chatgpt-codex-connector bot reviewed Jan 30, 2026

View reviewed changes

david6666666 linked an issue Jan 30, 2026 that may be closed by this pull request

[CI Failure]: Qwen3-omni, async_chunk: True, send request, occasional error reports: msgspec.DecodeError: MessagePack data is malformed: trailing characters (byte 20426629) #1103

Closed

1 task

david6666666 changed the title ~~[BugFix] Modify max_tokens and modify the log~~ [BugFix] Modify max_tokens and modify the log and fix #1103 Jan 30, 2026

Modify max_tokens and modify the log

2b9c6e1

Signed-off-by: amy-why-3459 <[email protected]>

amy-why-3459 force-pushed the bugfix_chunk branch from 0d8e9d8 to 2b9c6e1 Compare January 30, 2026 11:06

david6666666 added this to the v0.14.0 milestone Jan 30, 2026

Merge branch 'main' into bugfix_chunk

497e4ce

hsliuustc0106 merged commit e6b9d52 into vllm-project:main Jan 30, 2026
6 of 7 checks passed

david6666666 mentioned this pull request Jan 30, 2026

[Test] test_mix_to_text_audio_001 skip async chunk scenario #1111

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[BugFix] Modify max_tokens and modify the log and fix #1103#1097

[BugFix] Modify max_tokens and modify the log and fix #1103#1097
hsliuustc0106 merged 2 commits intovllm-project:mainfrom
amy-why-3459:bugfix_chunk

amy-why-3459 commented Jan 30, 2026 •

edited

Loading

Uh oh!

hsliuustc0106 Jan 30, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

david6666666 commented Jan 30, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

amy-why-3459 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

hsliuustc0106 Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

david6666666 commented Jan 30, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amy-why-3459 commented Jan 30, 2026 •

edited

Loading