Skip to content

Comments

[BugFix] Modify max_tokens and modify the log and fix #1103#1097

Merged
hsliuustc0106 merged 2 commits intovllm-project:mainfrom
amy-why-3459:bugfix_chunk
Jan 30, 2026
Merged

[BugFix] Modify max_tokens and modify the log and fix #1103#1097
hsliuustc0106 merged 2 commits intovllm-project:mainfrom
amy-why-3459:bugfix_chunk

Conversation

@amy-why-3459
Copy link
Contributor

@amy-why-3459 amy-why-3459 commented Jan 30, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Fix #1099

Test Plan

vllm serve Qwen/Qwen3-Omni-30B-A3B-Instruct 
     --omni 
     --port 8091 
     --stage-configs-path /vllm_omni/model_executor/stage_configs/qwen3_omni_moe_async_chunk.yaml

Test Result

{'type': 'request_level_metrics',
  'request_id': 'chatcmpl-9635776e7401b703',
  'e2e_time_ms': 15611.124753952026,
  'e2e_tpt': 6.048479176269673,
  'e2e_total_tokens': 2581,
  'transfers_total_time_ms': 0.0,
  'transfers_total_bytes': 0,
  'stages': {0: {'stage_gen_time_ms': 75.16026496887207,
                 'num_tokens_out': 90,
                 'num_tokens_in': 2491},
             2: {'stage_gen_time_ms': 213.80853652954102, 'num_tokens_out': 0}}}
 [Summary] {'e2e_requests': 1,
  'e2e_total_time_ms': 15611.555576324463,
  'e2e_sum_time_ms': 15611.124753952026,
  'e2e_total_tokens': 2581,
  'e2e_avg_time_per_request_ms': 15611.124753952026,
  'e2e_avg_tokens_per_s': 165.33081636841115,
  'wall_time_ms': 15611.555576324463,
  'final_stage_id': 2,
  'stages': [{'stage_id': 0,
              'requests': 1,
              'tokens': 2581,
              'total_time_ms': 10248.541831970215,
              'avg_time_per_request_ms': 10248.541831970215,
              'avg_tokens_per_s': 251.8407049819125},
             {'stage_id': 1,
              'requests': 1,
              'tokens': 328,
              'total_time_ms': 6840.2392864227295,
              'avg_time_per_request_ms': 6840.2392864227295,
              'avg_tokens_per_s': 47.95153886663746},
             {'stage_id': 2,
              'requests': 1,
              'tokens': 0,
              'total_time_ms': 6722.139120101929,
              'avg_time_per_request_ms': 6722.139120101929,
              'avg_tokens_per_s': 0.0}]}
the request e2e is: 20.947600789368153
the avg e2e is: 20.947600789368153

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

temperature: 0.9
top_k: 50
max_tokens: 4096
max_tokens: 2048 # TODO: The max_tokens of the async_chunk feature cannot exceed 2048.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this model specific?

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test results

@david6666666
Copy link
Collaborator

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d8e9d8679

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +57 to +61
lock_file = f"/dev/shm/shm_{put_key}_lockfile.lock"
with open(lock_file, "w") as lockf:
fcntl.flock(lockf, fcntl.LOCK_EX)
meta = shm_write_bytes(payload, name=put_key)
fcntl.flock(lockf, fcntl.LOCK_UN)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clean up per-chunk lock files to avoid /dev/shm leaks

This change creates a new lock file under /dev/shm for every put_key (request+stage+chunk) but there is no corresponding deletion anywhere in the connector. On long-running servers or high‑throughput workloads, these lock files will accumulate indefinitely and can exhaust /dev/shm inode/space limits, which will then cause future SHM writes/locks to fail. This only happens when many chunks are produced, but it’s a realistic production scenario for streaming workloads.

Useful? React with 👍 / 👎.

@david6666666 david6666666 changed the title [BugFix] Modify max_tokens and modify the log [BugFix] Modify max_tokens and modify the log and fix #1103 Jan 30, 2026
@david6666666 david6666666 added this to the v0.14.0 milestone Jan 30, 2026
@hsliuustc0106 hsliuustc0106 merged commit e6b9d52 into vllm-project:main Jan 30, 2026
6 of 7 checks passed
dongbo910220 pushed a commit to dongbo910220/vllm-omni that referenced this pull request Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment