Skip to content

Comments

[Core]Add Diffusion executor#865

Merged
hsliuustc0106 merged 3 commits intovllm-project:mainfrom
natureofnature:diffusion_executor
Jan 21, 2026
Merged

[Core]Add Diffusion executor#865
hsliuustc0106 merged 3 commits intovllm-project:mainfrom
natureofnature:diffusion_executor

Conversation

@natureofnature
Copy link
Contributor

@natureofnature natureofnature commented Jan 20, 2026

Purpose

This PR aligns the diffusion execution stack with vLLM’s model executor structure, extracting executor responsibilities from the engine and making the executor pluggable. The refactor sets the foundation for worker-actor based execution and enables swapping in RL-specific executors. In the future, ray based worker actor could then be easily added accordingly to support distributed execution.
Key goals:

  1. Align with vLLM model executor design
  2. Enable future worker-actor implementations
  3. Support RL workflows via executor replacement ([Feature]: Support ray backend support for Omni Diffusion Worker #796, [RFC]: Reinforcement learning support on vllm-omni #778)
image

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@natureofnature
Copy link
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@natureofnature
Copy link
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@hsliuustc0106
Copy link
Collaborator

@SamitHuang @ZJY0516 @wtomin PTAL

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 20, 2026

cc @knlnguyen1802

Copy link
Collaborator

@ZJY0516 ZJY0516 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overll LGTM. Let's wait for #847

return MultiprocDiffusionExecutor

try:
executor_class = resolve_obj_by_qualname(backend)
Copy link
Collaborator

@ZJY0516 ZJY0516 Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which scenario we need a class name instead of "mp" or "ray"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, as for RL case, @knlnguyen1802 will define his own executor and worker to enable user defined worker functions. (#686) @ZJY0516

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for clear you can explicit it as an "external_backend". See example from vllm https://github.com/vllm-project/vllm/blob/8be263c3fb1f98d85bd6a06d52e6036057f8814e/vllm/v1/executor/abstract.py#L73

Copy link
Contributor Author

@natureofnature natureofnature Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for clear you can explicit it as an "external_backend". See example from vllm https://github.com/vllm-project/vllm/blob/8be263c3fb1f98d85bd6a06d52e6036057f8814e/vllm/v1/executor/abstract.py#L73

I updated the code to align with vllm for this function, and I suppose you should use the branch isinstance(distributed_executor_backend, str)


def close(self) -> None:
self._finalizer()
if hasattr(self, "executor"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why wouldn't the engine have an executor attribute here?

Copy link
Contributor Author

@natureofnature natureofnature Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defensive programming to handle cases where DiffusionEngine initialization fails before the executor is fully assigned.

Comment on lines -17 to -20
def __new__(cls, *args, **kwargs):
if not cls._instance:
cls._instance = super().__new__(cls)
return cls._instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the singleton pattern being dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the scheduler manages resources that should be specific to a single DiffusionEngine instance, not global to the entire process.

Comment on lines +55 to +56
self.scheduler = Scheduler()
self.scheduler.initialize(self.od_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't the scheduler belong with the engine, if aligning with vllm design?

Copy link
Contributor Author

@natureofnature natureofnature Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the Scheduler effectively acts as a communicator between the engine and worker via its internal MessageQueue, and the plugged executor needs it to do the communication as well. That's why it's kept into the executor. Since the executor is owned by the Engine, the scheduler remains an instance-level resource, strictly aligning with vLLM's design.
In the future, I suggest we move the communication channel from the scheduler to the executor to better separate their concerns. However, this commit focuses on architecture updates , and I prefer not to modify the internal workflow too drastically at this stage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep it in a future pr

@natureofnature
Copy link
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ZJY0516 ZJY0516 added the ready label to trigger buildkite CI label Jan 21, 2026
Signed-off-by: wzliu <wzliu@connect.hku.hk>
Signed-off-by: wzliu <wzliu@connect.hku.hk>
Signed-off-by: wzliu <wzliu@connect.hku.hk>
Comment on lines +55 to +56
self.scheduler = Scheduler()
self.scheduler.initialize(self.od_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 2 line can be merge into 1. The initialize can be move inside the constructor

Copy link
Contributor

@knlnguyen1802 knlnguyen1802 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, design and logic LGTM, thanks for the work.

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, please update the desgin doc for diffusion module @SamitHuang

@hsliuustc0106 hsliuustc0106 merged commit c9d7cd1 into vllm-project:main Jan 21, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants