Skip to content

[RFC]: Add FP8 quantization support for Wan 2.2 Transformer #1042

@lishunyang12

Description

@lishunyang12

Motivation.

FP8 quantization support was recently added for Z-Image DiT #b7604ae. This feature enables significant memory
reduction and potential speedups on supported hardware (Ada/Hopper GPUs).

Wan 2.2 is a video generation model that would greatly benefit from FP8 quantization due to its high memory requirements for 3D video transformers. This issue tracks extending the same FP8 quantization infrastructure to Wan 2.2.

Current State

  • FP8 quantization framework exists in vllm_omni/diffusion/quantization/
  • Z-Image transformer fully supports FP8 via quant_config parameter
  • Wan 2.2 transformer (WanTransformer3DModel) does not accept quantization config

Proposed Change.

1. Transformer Layer Modifications

File: vllm_omni/diffusion/models/wan2_2/wan2_2_transformer.py

Class Changes Required
WanSelfAttention Add quant_config parameter
Pass it to QKVParallelLinear and output projection
WanCrossAttention Add quant_config parameter
Pass it to Q/K/V and output linear layers
WanTransformerBlock Add quant_config parameter
Propagate to attention and FFN layers
WanTransformer3DModel Add quant_config parameter
Propagate to all transformer blocks

2. Pipeline Integration

File: vllm_omni/diffusion/models/wan2_2/pipeline_wan2_2.py

Step Description
1 Extract quantization config from OmniDiffusionConfig using get_vllm_quant_config_for_layers()
2 Pass the extracted quant_config to WanTransformer3DModel initialization

3. CLI Support

Directory: examples/offline_inference/video/ (if applicable / create if missing)

  • Add --quantization argument
    • Support fp8 option (e.g. --quantization fp8)
    • Should enable FP8 quantization flow when provided
    • Keep backward compatibility (no change when argument is omitted)

4. Tests

Directory: tests/diffusion/models/wan2_2/ (add to existing or create new test file(s))

Test Case Description
Unit tests for Wan 2.2 with FP8 quantization Run full forward pass with quant_config enabled (FP8) and verify no crashes / reasonable outputs
Config propagation verification Check that quant_config reaches all linear layers in the transformer blocks (self-attn QKV/out, cross-attn QKV/out, FFN linears)
Null config fallback Ensure existing Wan 2.2 behavior is unchanged when quant_config=None (no quantization applied)

Acceptance Criteria

  • WanTransformer3DModel constructor accepts quant_config parameter
  • All relevant linear layers inside Wan 2.2 transformer receive the quantization config
  • Pipeline correctly extracts quantization config from OmniDiffusionConfig and passes it downstream
  • When quant_config=None, model runs in original (non-quantized) mode with unchanged functionality & outputs
  • All added / modified unit tests pass
  • (Optional but recommended) Example inference script demonstrates successful FP8 usage (e.g. in examples/)

Reference Implementation

Use the Z-Image FP8 implementation as the main reference:

Component File Path Notes
Transformer vllm_omni/diffusion/models/z_image/z_image_transformer.py Shows how quant_config is threaded through attention / feed-forward layers
Pipeline vllm_omni/diffusion/models/z_image/pipeline_z_image.py Shows extraction from config and passing to model init
Commit b7604ae Full diff / context for the Z-Image FP8 PR

Additional Context

Hardware Requirements for FP8:

Quantization Mode Supported GPUs Compute Capability
Full W8A8 (weights + activations) Ada Lovelace, Hopper SM 89+
Weight-only FP8 Turing and newer SM 75+ (falls back to W8A16 on Ampere via Marlin kernels)

Related Links / References:

Feedback Period.

No response

CC List.

@ZJY0516 @hsliuustc0106 @SamitHuang @david6666666

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions