Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/.nav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ nav:
- examples/README.md
- Offline Inference:
- Image-To-Image: user_guide/examples/offline_inference/image_to_image.md
- Image-To-Video: user_guide/examples/offline_inference/image_to_video.md
- Qwen2.5-Omni: user_guide/examples/offline_inference/qwen2_5_omni.md
- Qwen3-Omni: user_guide/examples/offline_inference/qwen3_omni.md
- Text-To-Image: user_guide/examples/offline_inference/text_to_image.md
Expand Down
66 changes: 66 additions & 0 deletions docs/user_guide/examples/offline_inference/image_to_video.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Image-To-Video

Source <https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_video>.


This example demonstrates how to generate videos from images using Wan2.2 Image-to-Video models with vLLM-Omni's offline inference API.

## Local CLI Usage

### Wan2.2-I2V-A14B-Diffusers (MoE)
```bash
python image_to_video.py \
--model Wan-AI/Wan2.2-I2V-A14B-Diffusers \
--image input.png \
--prompt "A cat playing with yarn, smooth motion" \
--negative_prompt "<optional quality filter>" \
--height 480 \
--width 832 \
--num_frames 48 \
--guidance_scale 5.0 \
--guidance_scale_high 6.0 \
--num_inference_steps 40 \
--boundary_ratio 0.875 \
--flow_shift 12.0 \
--fps 16 \
--output i2v_output.mp4
```

### Wan2.2-TI2V-5B-Diffusers (Unified)
```bash
python image_to_video.py \
--model Wan-AI/Wan2.2-TI2V-5B-Diffusers \
--image input.png \
--prompt "A cat playing with yarn, smooth motion" \
--negative_prompt "<optional quality filter>" \
--height 480 \
--width 832 \
--num_frames 48 \
--guidance_scale 4.0 \
--num_inference_steps 40 \
--flow_shift 12.0 \
--fps 16 \
--output i2v_output.mp4
```

Key arguments:

- `--model`: Model ID (I2V-A14B for MoE, TI2V-5B for unified T2V+I2V).
- `--image`: Path to input image (required).
- `--prompt`: Text description of desired motion/animation.
- `--height/--width`: Output resolution (auto-calculated from image if not set). Dimensions should be multiples of 16.
- `--num_frames`: Number of frames (default 81).
- `--guidance_scale` and `--guidance_scale_high`: CFG scale (applied to low/high-noise stages for MoE).
- `--negative_prompt`: Optional list of artifacts to suppress.
- `--boundary_ratio`: Boundary split ratio for two-stage MoE models.
- `--flow_shift`: Scheduler flow shift (5.0 for 720p, 12.0 for 480p).
- `--num_inference_steps`: Number of denoising steps (default 50).
- `--fps`: Frames per second for the saved MP4 (requires `diffusers` export_to_video).
- `--output`: Path to save the generated video.

## Example materials

??? abstract "image_to_video.py"
``````py
--8<-- "examples/offline_inference/image_to_video/image_to_video.py"
``````
1 change: 1 addition & 0 deletions docs/user_guide/examples/offline_inference/qwen2_5_omni.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

Source <https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni>.


## Setup
Please refer to the [stage configuration documentation](https://docs.vllm.ai/projects/vllm-omni/en/latest/configuration/stage_configs/) to configure memory allocation appropriately for your hardware setup.

Expand Down
1 change: 1 addition & 0 deletions docs/user_guide/examples/offline_inference/qwen3_omni.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

Source <https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni>.


## Setup
Please refer to the [stage configuration documentation](https://docs.vllm.ai/projects/vllm-omni/en/latest/configuration/stage_configs/) to configure memory allocation appropriately for your hardware setup.

Expand Down
4 changes: 4 additions & 0 deletions docs/user_guide/examples/online_serving/image_to_image.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,10 @@ Provide multiple images in `content` (order matters):
``````py
--8<-- "examples/online_serving/image_to_image/openai_chat_client.py"
``````
??? abstract "run_curl_image_edit.sh"
``````sh
--8<-- "examples/online_serving/image_to_image/run_curl_image_edit.sh"
``````
??? abstract "run_server.sh"
``````sh
--8<-- "examples/online_serving/image_to_image/run_server.sh"
Expand Down
1 change: 1 addition & 0 deletions docs/user_guide/examples/online_serving/qwen2_5_omni.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ response = client.chat.completions.create(
# Response contains two choices: one with text, one with audio
print(response.choices[0].message.content) # Text response
print(response.choices[1].message.audio) # Audio response
```

## Streaming Output
If you want to enable streaming output, please set the argument as below. The final output will be obtained just after generated by corresponding stage. Now we only support text streaming output. Other modalities can output normally.
Expand Down
1 change: 1 addition & 0 deletions docs/user_guide/examples/online_serving/qwen3_omni.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ response = client.chat.completions.create(
# Response contains two choices: one with text, one with audio
print(response.choices[0].message.content) # Text response
print(response.choices[1].message.audio) # Audio response
```

## Streaming Output
If you want to enable streaming output, please set the argument as below. The final output will be obtained just after generated by corresponding stage. Now we only support text streaming output. Other modalities can output normally.
Expand Down
1 change: 1 addition & 0 deletions examples/online_serving/qwen2_5_omni/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ response = client.chat.completions.create(
# Response contains two choices: one with text, one with audio
print(response.choices[0].message.content) # Text response
print(response.choices[1].message.audio) # Audio response
```

## Streaming Output
If you want to enable streaming output, please set the argument as below. The final output will be obtained just after generated by corresponding stage. Now we only support text streaming output. Other modalities can output normally.
Expand Down
1 change: 1 addition & 0 deletions examples/online_serving/qwen3_omni/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ response = client.chat.completions.create(
# Response contains two choices: one with text, one with audio
print(response.choices[0].message.content) # Text response
print(response.choices[1].message.audio) # Audio response
```

## Streaming Output
If you want to enable streaming output, please set the argument as below. The final output will be obtained just after generated by corresponding stage. Now we only support text streaming output. Other modalities can output normally.
Expand Down