[Model] Add Wan2.2 I2V and TI2V pipeline support#329
[Model] Add Wan2.2 I2V and TI2V pipeline support#329hsliuustc0106 merged 11 commits intovllm-project:mainfrom
Conversation
Signed-off-by: linyueqian <linyueqian@outlook.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
SamitHuang
left a comment
There was a problem hiding this comment.
please update supported model list as well
Signed-off-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: linyueqian <linyueqian@outlook.com>
| @@ -1,8 +1,6 @@ | |||
| from typing import Any | |||
There was a problem hiding this comment.
why we need to move the imports?
There was a problem hiding this comment.
i didnt install the cache dependency at that time. i can revert it.
docs/models/supported_models.md
Outdated
| | `QwenImageEditPipeline` | Qwen-Image-Edit | `Qwen/Qwen-Image-Edit` | | ||
| |`ZImagePipeline` | Z-Image | `Tongyi-MAI/Z-Image-Turbo` | | ||
| | `WanPipeline` | Wan2.2 | `Wan-AI/Wan2.2-T2V-A14B-Diffusers` | | ||
| | `Wan22Pipeline` | Wan2.2-T2V | `Wan-AI/Wan2.2-T2V-A14B-Diffusers` | |
There was a problem hiding this comment.
use WAN22Pipeline for all t2v, i2v and ti2v models?
There was a problem hiding this comment.
The three variants have different architectures (I2V-A14B uses CLIP encoder + MoE, TI2V-5B uses single transformer with VAE conditioning). Consolidation is possible but adds some model selection complexity in one pipeline. I'm leaning toward keeping separate pipelines, which follows diffusers' convention where T2V and I2V also have different pipeline classes. Let me know if you prefer otherwise.
There was a problem hiding this comment.
For the pipeline name, i think we should follow the identifier in Wan2.2-T2V-A14B-Diffusers/
model_index.json
"_class_name": "WanPipeline"
hsliuustc0106
left a comment
There was a problem hiding this comment.
let's get it done asap
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Signed-off-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: linyueqian <linyueqian@outlook.com>
|
@SamitHuang @hsliuustc0106 Please review again. I use the identifier name from their corresponding model_index.json file. |
Signed-off-by: linyueqian <linyueqian@outlook.com>
|
could you please submit a relevant recpie to vllm/recipe? |
Currently, we only support Cache-DiT acceleration for wan, so a recipe briefly mentioning this would be perfect. |
|
hsliuustc0106
left a comment
There was a problem hiding this comment.
lgtm, later we are going to test USP&CFG parallel when the feature is ready.
Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: linyueqian <linyueqian@outlook.com> Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Add support for Wan2.2 Image-to-Video (I2V) and Text-Image-to-Video (TI2V) pipelines.
Test Plan
Test 1: I2V-A14B (Image-to-Video with MoE)
python examples/offline_inference/image_to_video/image_to_video.py \ --model Wan-AI/Wan2.2-I2V-A14B-Diffusers \ --image input.jpg --num_frames 17 --num_inference_steps 10 \ --output i2v_a14b_output.mp4Test 2: TI2V-5B I2V mode (Image-to-Video with unified model)
python examples/offline_inference/image_to_video/image_to_video.py \ --model Wan-AI/Wan2.2-TI2V-5B-Diffusers \ --image input.jpg --num_frames 17 --num_inference_steps 10 \ --output ti2v_5b_i2v_output.mp4Test 3: TI2V-5B T2V mode (Text-only, no image)
python examples/offline_inference/text_to_video/text_to_video.py \ --model Wan-AI/Wan2.2-TI2V-5B-Diffusers \ --prompt "A cat walking in the garden" \ --num_frames 17 --num_inference_steps 10 \ --output ti2v_5b_t2v_output.mp4Test Result
i2v_a14b_output.mp4
ti2v_5b_i2v_output.mp4
ti2v_5b_t2v_output.mp4
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)