[Core]Add Diffusion executor#865
Conversation
8b9dc2a to
68b12b3
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. Another round soon, please! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
68b12b3 to
58fbd80
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
|
@SamitHuang @ZJY0516 @wtomin PTAL |
| return MultiprocDiffusionExecutor | ||
|
|
||
| try: | ||
| executor_class = resolve_obj_by_qualname(backend) |
There was a problem hiding this comment.
which scenario we need a class name instead of "mp" or "ray"?
There was a problem hiding this comment.
For example, as for RL case, @knlnguyen1802 will define his own executor and worker to enable user defined worker functions. (#686) @ZJY0516
There was a problem hiding this comment.
I think for clear you can explicit it as an "external_backend". See example from vllm https://github.com/vllm-project/vllm/blob/8be263c3fb1f98d85bd6a06d52e6036057f8814e/vllm/v1/executor/abstract.py#L73
There was a problem hiding this comment.
I think for clear you can explicit it as an "external_backend". See example from vllm https://github.com/vllm-project/vllm/blob/8be263c3fb1f98d85bd6a06d52e6036057f8814e/vllm/v1/executor/abstract.py#L73
I updated the code to align with vllm for this function, and I suppose you should use the branch isinstance(distributed_executor_backend, str)
|
|
||
| def close(self) -> None: | ||
| self._finalizer() | ||
| if hasattr(self, "executor"): |
There was a problem hiding this comment.
why wouldn't the engine have an executor attribute here?
There was a problem hiding this comment.
This is defensive programming to handle cases where DiffusionEngine initialization fails before the executor is fully assigned.
| def __new__(cls, *args, **kwargs): | ||
| if not cls._instance: | ||
| cls._instance = super().__new__(cls) | ||
| return cls._instance |
There was a problem hiding this comment.
why is the singleton pattern being dropped?
There was a problem hiding this comment.
I think the scheduler manages resources that should be specific to a single DiffusionEngine instance, not global to the entire process.
| self.scheduler = Scheduler() | ||
| self.scheduler.initialize(self.od_config) |
There was a problem hiding this comment.
doesn't the scheduler belong with the engine, if aligning with vllm design?
There was a problem hiding this comment.
Currently, the Scheduler effectively acts as a communicator between the engine and worker via its internal MessageQueue, and the plugged executor needs it to do the communication as well. That's why it's kept into the executor. Since the executor is owned by the Engine, the scheduler remains an instance-level resource, strictly aligning with vLLM's design.
In the future, I suggest we move the communication channel from the scheduler to the executor to better separate their concerns. However, this commit focuses on architecture updates , and I prefer not to modify the internal workflow too drastically at this stage.
There was a problem hiding this comment.
keep it in a future pr
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
Signed-off-by: wzliu <wzliu@connect.hku.hk>
Signed-off-by: wzliu <wzliu@connect.hku.hk>
Signed-off-by: wzliu <wzliu@connect.hku.hk>
f26c4e3 to
f8782d1
Compare
| self.scheduler = Scheduler() | ||
| self.scheduler.initialize(self.od_config) |
There was a problem hiding this comment.
This 2 line can be merge into 1. The initialize can be move inside the constructor
knlnguyen1802
left a comment
There was a problem hiding this comment.
Overall, design and logic LGTM, thanks for the work.
hsliuustc0106
left a comment
There was a problem hiding this comment.
lgtm, please update the desgin doc for diffusion module @SamitHuang
Purpose
This PR aligns the diffusion execution stack with vLLM’s model executor structure, extracting executor responsibilities from the engine and making the executor pluggable. The refactor sets the foundation for worker-actor based execution and enables swapping in RL-specific executors. In the future, ray based worker actor could then be easily added accordingly to support distributed execution.
Key goals:
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)