[Model] Add Wan2.2 text-to-video support#202
[Model] Add Wan2.2 text-to-video support#202hsliuustc0106 merged 18 commits intovllm-project:mainfrom
Conversation
Signed-off-by: linyueqian <[email protected]>
|
The account who enabled Codex for this repo no longer has access to Codex. Please contact the admins of this repo to enable Codex again. |
|
nice work. Can you try to increase |
hsliuustc0106
left a comment
There was a problem hiding this comment.
After this, I suggest you try to link this with fastwan proposed in fastvideo project. Let's see how this can accelerate our inference and provide a solution to coordinate with fastvideo.
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
@SamitHuang I try with 40 steps and it takes about five minutes to generate. wan22_output_50.mp4 |
we may need some acceleration methods to speed up generation |
|
please add this model to the supported models.md |
Signed-off-by: linyueqian <[email protected]>
| if isinstance(video_array, np.ndarray) and video_array.ndim == 4: | ||
| video_array = list(video_array) | ||
|
|
||
| export_to_video(video_array, str(output_path), fps=16) |
There was a problem hiding this comment.
fps can be 24 too. it's better to be configurable via argparser
There was a problem hiding this comment.
got it. i change it accordingly
Signed-off-by: linyueqian <[email protected]>
|
i think
i think you can update the test method and result with this new video, where |
I have updated the test result in the first comment. |
|
when diffuser model support TP,CFG,USP and distVAE? |
Signed-off-by: linyueqian <[email protected]>
TP/USP should be ready by the end of this month, others left to Q1 |
|
add the tests, please refer to the qwen-image tests |
Signed-off-by: linyueqian <[email protected]>
got it. i just add the |
Signed-off-by: linyueqian <[email protected]>
|
I think we can get this PR merged now, later we need to open a new issue for a few todo jobs
- [ ] refactor the examples/offline/video_generation/ which can be used for other video generation models
|
Got it. |
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]> Signed-off-by: Fanli Lin <[email protected]>
Signed-off-by: linyueqian <[email protected]>
|
Excellent work. |



PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Add support for Wan2.2 text-to-video generation.
Test Plan
python examples/offline_inference/wan22/text_to_video.py \ --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \ --negative_prompt "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量, JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不 动的画面,杂乱的背景,三条腿,背景人很多,倒着走" \ --height 720 \ --width 1280 \ --num_frames 32 \ --guidance_scale 4.0 \ --guidance_scale_high 3.0 \ --num_inference_steps 40 \ --fps 16 \ --output t2v_out.mp4Test Result
t2v_out.mp4
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)