Skip to content

feat: support decode wramup for graph pre-capture.#1502

Merged
weizhehuang0827 merged 1 commit into
jd-opensource:mainfrom
weizhehuang0827:refactor_offline
May 21, 2026
Merged

feat: support decode wramup for graph pre-capture.#1502
weizhehuang0827 merged 1 commit into
jd-opensource:mainfrom
weizhehuang0827:refactor_offline

Conversation

@weizhehuang0827
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request propagates max_tokens_per_batch and max_seqs_per_batch through the distributed runtime and graph executors, replacing global SchedulerConfig access with local options. It also updates the SpawnWorkerServer process to handle these parameters via command-line arguments and refactors the ProfileManager warmup logic. Feedback identifies a regression in the warmup batch size generation that could skip the maximum capacity graph, and a style guide violation where the default value for max_seqs_per_batch in ProfileManager does not match the global configuration default.

Comment thread xllm/core/scheduler/profile/profile_manager.cpp
Comment thread xllm/core/scheduler/profile/profile_manager.h Outdated
Copy link
Copy Markdown
Collaborator

@yq33victor yq33victor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@weizhehuang0827 weizhehuang0827 merged commit 7a65e66 into jd-opensource:main May 21, 2026
4 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants