Implement timestep_conditioning [Ready for review] by Yash-Vijay29 · Pull Request #3411 · openvinotoolkit/openvino.genai

Yash-Vijay29 · 2026-02-28T09:00:49Z

Description

Wrote logic for timestep conditioning
Wrote tests for timestep conditioning
Tested against LTX-Video-0.9.1

Changes:

Timestep_conditioning models such as LTX-Video 0.9.1 can be successfully run using pipeline
WWB now has extra parameters "decode-timestep" "decode-noise-scale" to input with such models
"decode-timestep" and "decode-noise-scale" are parameters that can be input in models where vae decoder supports timestep conditioning "true in the config"
there is a search algorithm to look for timestep embeddings in the vae_decoder, common keywords are included
Documentation has been updated for LTX Pipeline, and WWB accordingly

@likholat

So i patched the optimum-intel to work with LTX Video 0.9.1 (it has timestep_conditioning in its decoder)

and modified WWB cli to help benchmark these models.

updated documentation to include extra parameters for timestep_conditioning.

I also opened a PR to modify optimum-intel for as they didnt support exporting timestep_conditioning models into IR format either:
huggingface/optimum-intel#1652

Together with this PR it should allow inference to work with LTX-Video 0.9.1 atleast. Other similar models from LTX family should work

TESTING METHOD FOR TIMESTEP:
first ran
wwb --base-model Lightricks/LTX-Video-0.9.1 --gt-data video_gen_test_ts/gt.csv --model-type text-to-video --hf --decode-timestep 0.05 --decode-noise-scale 0.025 --num-samples 5

then ran

wwb --target-model ltx-video-0.9.1-ov --gt-data video_gen_test_ts/gt.csv --model-type text-to-video --genai --output ltx_video_genai_ts --decode-timestep 0.05 --decode-noise-scale 0.025 --num-samples 5

Accuracy: with timestep at 0.05 and decode-noise-scale 0.025:

0.76939785

Regular HF took 53 minutes to complete.
GenAI pipeline took 40 minutes to complete.

attaching metrics:
metrics_per_question.csv

TESTING FOR LTX-Video 0.9.1 with TIMESTEP OFF

first ran
wwb --target-model ltx-video-0.9.1-ov --gt-data video_gen_test_ts/gt.csv --model-type text-to-video --genai --output ltx_video_genai_ts --decode-timestep 0 --decode-noise-scale 0 --num-samples 5
then ran

wwb --target-model ltx-video-0.9.1-ov --gt-data video_gen_test_ts/gt.csv --model-type text-to-video --genai --output ltx_video_genai_ts --decode-timestep 0 --decode-noise-scale 0 --num-samples 5

Similarity score over the 5 prompts:
0.751931

Attaching metrics per question:
metrics_per_question_ts_off.csv

Let me know if you need other tests run or some changes to the codebase.

Ltx-Video- 0.9.1 works as far as i can tell. other similar models would too hopefully.

Fixes #3410

Checklist:

This PR follows GenAI Contributing guidelines.
Tests have been updated or added to cover the new code.
This PR fully addresses the ticket.
I have made corresponding changes to the documentation.

Copilot

Pull request overview

Adds end-to-end support for VAE timestep conditioning in the LTX video generation path, exposing the new capability through the C++ and Python APIs and validating it with Python tests.

Changes:

Extend Text2VideoPipeline::decode() and AutoencoderKLLTXVideo::decode() to accept decode_timestep (defaulting to 0.0f).
Pass the normalized last scheduler timestep into VAE decode inside LTXPipeline.
Expose timestep_conditioning in the Python config binding and add Python tests covering config exposure and pipeline decode API behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/python_tests/test_video_generation.py	Adds tests for `timestep_conditioning` exposure and `Text2VideoPipeline.decode()` accepting an optional `decode_timestep`.
src/python/py_video_generation_pipelines.cpp	Exposes `Text2VideoPipeline.decode(latent, decode_timestep=0.0)` to Python with GIL release and docstring.
src/python/py_video_generation_models.cpp	Exposes `AutoencoderKLLTXVideo::Config::timestep_conditioning` and adds `decode_timestep` arg to VAE decode binding.
src/cpp/src/video_generation/text2video_pipeline.cpp	Implements new `Text2VideoPipeline::decode(latent, decode_timestep)` forwarding to impl.
src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp	Implements timestep input handling in reshape/decode when `timestep_conditioning` is enabled.
src/cpp/src/video_generation/ltx_pipeline.hpp	Computes `decode_timestep` from scheduler timesteps and passes it into VAE decode; updates pipeline decode signature.
src/cpp/include/openvino/genai/video_generation/text2video_pipeline.hpp	Updates public API and docs for `decode_timestep`.
src/cpp/include/openvino/genai/video_generation/autoencoder_kl_ltx_video.hpp	Updates public API and docs for `decode_timestep` on VAE decode.

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

src/cpp/src/video_generation/ltx_pipeline.hpp

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (4)

src/cpp/include/openvino/genai/video_generation/autoencoder_kl_ltx_video.hpp:56

The comment says normalization is timestep / 1000, but the actual normalization in the pipeline is timestep / timesteps.front() (effectively timestep / num_train_timesteps). Consider updating the wording to avoid implying the divisor is always exactly 1000.

    // When timestep_conditioning is enabled in the config, decode_timestep must be
    // the last scheduler timestep normalized to [0, 1] (i.e., timestep / 1000).
    // For models without timestep_conditioning, the value is ignored.
    ov::Tensor decode(const ov::Tensor& latent, float decode_timestep = 0.0f);

src/cpp/src/video_generation/ltx_pipeline.hpp:676

LTXPipeline::decode() updates the shared m_perf_metrics.vae_decoder_inference_duration. Since user callbacks are executed on a worker thread (ThreadedCallbackWrapper), calling pipe.decode() from inside a callback will write to m_perf_metrics concurrently with the main generation thread, causing a data race (UB) and potentially corrupting perf metrics. Consider returning perf stats computed locally for decode() (or guarding perf metrics with a mutex / making decode() not mutate shared state).

    VideoGenerationResult decode(const ov::Tensor& latent, float decode_timestep = 0.0f) {
        ov::Tensor postprocessed = postprocess_latents(latent);

        const auto decode_start = std::chrono::steady_clock::now();
        ov::Tensor video = m_vae->decode(postprocessed, decode_timestep);
        m_perf_metrics.vae_decoder_inference_duration =
            std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now() - decode_start)
                .count();

        return VideoGenerationResult{video, m_perf_metrics};

src/python/py_video_generation_pipelines.cpp:118

Docstring states normalization as timestep / 1000, but the scheduler’s num_train_timesteps is configurable (even if typically 1000). Consider updating the wording to timestep / num_train_timesteps (or timestep / max_timestep) to match the C++ implementation and avoid confusion.

                decode_timestep (float): Last scheduler timestep normalized to [0, 1] (timestep / 1000).
                    Required when the VAE config has timestep_conditioning=True (e.g., LTX-Video 0.9.1+).
                    Ignored for models without timestep conditioning.

src/python/py_video_generation_models.cpp:227

Docstring hard-codes normalization as timestep / 1000, but the scheduler config can change num_train_timesteps. Consider wording this as timestep / num_train_timesteps (or timestep / max_timestep) for accuracy and consistency with the pipeline’s normalization logic.

                decode_timestep (float): Last scheduler timestep normalized to [0, 1] (timestep / 1000).
                    Required when the VAE config has timestep_conditioning=True (e.g., LTX-Video 0.9.1+).
                    Ignored for models without timestep conditioning.

src/cpp/include/openvino/genai/video_generation/text2video_pipeline.hpp

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

src/cpp/include/openvino/genai/video_generation/text2video_pipeline.hpp

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

tests/python_tests/test_video_generation.py:318

This comment assumes decode_timestep=0.5 corresponds to 500/1000, but the normalization is described elsewhere as timestep / max_timestep (scheduler-dependent). Please reword to avoid hard-coding 1000 so the test comment stays accurate if scheduler configs/models change.

        # decode_timestep=0.5 corresponds to scheduler timestep 500 / 1000; ignored for non-conditioning models.
        result = pipe.decode(latent_tensor, decode_timestep=0.5)

src/python/py_video_generation_models.cpp

src/cpp/include/openvino/genai/video_generation/autoencoder_kl_ltx_video.hpp

src/cpp/include/openvino/genai/video_generation/text2video_pipeline.hpp

src/cpp/src/video_generation/text2video_pipeline.cpp

src/cpp/include/openvino/genai/video_generation/autoencoder_kl_ltx_video.hpp

src/python/py_video_generation_pipelines.cpp

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (3)

src/cpp/include/openvino/genai/video_generation/autoencoder_kl_ltx_video.hpp:56

Header comment says decode_timestep is timestep / 1000, but scheduler num_train_timesteps is configurable and Text2VideoPipeline docs describe normalization as timestep / max_timestep (typically num_train_timesteps). Please update this comment to avoid hardcoding 1000 and keep documentation consistent across the API surface.

    // When timestep_conditioning is enabled in the config, decode_timestep must be
    // the last scheduler timestep normalized to [0, 1] (i.e., timestep / 1000).
    // For models without timestep_conditioning, the value is ignored.
    ov::Tensor decode(const ov::Tensor& latent, float decode_timestep = 0.0f);

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp:186

ov::Tensor ts is allocated on every decode() call when timestep_conditioning is enabled. If decode() is used inside callbacks to preview intermediate results, this repeated allocation can add overhead. Consider caching/reusing a {1} f32 tensor (e.g., as a member) and just updating its value before infer().

    if (m_config.timestep_conditioning) {
        ov::Tensor ts(ov::element::f32, {1});
        ts.data<float>()[0] = decode_timestep;
        m_decoder_request.set_tensor("timestep", ts);
    }

tests/python_tests/test_video_generation.py:318

The inline comment assumes normalization is timestep / 1000 ("500 / 1000"), but the scheduler’s num_train_timesteps is configurable and the C++ API docs describe normalization as timestep / max_timestep (typically num_train_timesteps). Consider rewording this comment to avoid hardcoding 1000 and just state that 0.5 is a representative normalized timestep value.

        # decode_timestep=0.5 corresponds to scheduler timestep 500 / 1000; ignored for non-conditioning models.
        result = pipe.decode(latent_tensor, decode_timestep=0.5)

src/python/py_video_generation_pipelines.cpp

src/cpp/include/openvino/genai/video_generation/text2video_pipeline.hpp

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

src/cpp/include/openvino/genai/video_generation/text2video_pipeline.hpp

src/cpp/src/video_generation/ltx_pipeline.hpp

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

src/python/py_video_generation_models.cpp

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

site/docs/use-cases/video-generation/_sections/_usage_options/index.mdx

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/cpp/include/openvino/genai/video_generation/generation_config.hpp:76

VideoGenerationConfig is a public struct; inserting new fields (decode_timestep, image_cond_noise_scale) in the middle changes the offsets of existing members that follow (e.g., taylorseer_config, adapters), which is a stronger ABI break than appending new fields at the end. If maintaining C++ ABI for existing clients is a goal, consider adding new fields at the end of the struct (or moving the struct behind a pImpl/versioned wrapper).

    /// Decode-time timestep for timestep-conditioned VAE decoders.
    /// std::nullopt uses pipeline default which is 0.0f for LTX-Video pipeline runtime.
    /// This value is forwarded to VAE only when VAE config enables timestep_conditioning.
    std::optional<float> decode_timestep = std::nullopt;

    /// Decode-time image conditioning noise scale for timestep-conditioned VAE decoders.
    /// std::nullopt uses pipeline default which is 0.0f for LTX-Video pipeline runtime.
    /// This value is forwarded to VAE only when VAE config enables timestep_conditioning.
    std::optional<float> image_cond_noise_scale = std::nullopt;

    /**
     * TaylorSeer configuration for caching transformer outputs.
     * When set, enables TaylorSeer Lite acceleration which skips some transformer inferences
     * and predicts outputs using Taylor series approximation.
     */
    std::optional<TaylorSeerCacheConfig> taylorseer_config;

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

src/cpp/include/openvino/genai/video_generation/autoencoder_kl_ltx_video.hpp

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

site/docs/use-cases/video-generation/_sections/_usage_options/index.mdx

Removed error handling for video loading and skipped pairs tracking.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

site/docs/use-cases/video-generation/_sections/_usage_options/index.mdx

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

tools/who_what_benchmark/tests/test_cli_videos.py

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

tools/who_what_benchmark/whowhatbench/wwb.py

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp

tools/who_what_benchmark/tests/test_cli_videos.py

Copilot AI review requested due to automatic review settings February 28, 2026 09:00

Yash-Vijay29 requested review from Wovchena, as-suvorov, likholat and sgonorov as code owners February 28, 2026 09:00

Yash-Vijay29 marked this pull request as draft February 28, 2026 09:01

github-actions bot added category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: video generation labels Feb 28, 2026

Copilot started reviewing on behalf of Yash-Vijay29 February 28, 2026 09:01 View session

Copilot AI reviewed Feb 28, 2026

View reviewed changes

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp Outdated Show resolved Hide resolved

src/cpp/src/video_generation/ltx_pipeline.hpp Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings February 28, 2026 09:21

Copilot started reviewing on behalf of Yash-Vijay29 February 28, 2026 09:22 View session

Copilot AI reviewed Feb 28, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings February 28, 2026 10:10

Copilot started reviewing on behalf of Yash-Vijay29 February 28, 2026 10:10 View session

Copilot AI reviewed Feb 28, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings February 28, 2026 10:20

Copilot started reviewing on behalf of Yash-Vijay29 February 28, 2026 10:20 View session

Copilot AI reviewed Feb 28, 2026

View reviewed changes

src/python/py_video_generation_pipelines.cpp Outdated Show resolved Hide resolved

src/cpp/include/openvino/genai/video_generation/text2video_pipeline.hpp Outdated Show resolved Hide resolved

Yash-Vijay29 changed the title ~~implement tests and logic for timestep_conditioning~~ logic for timestep_conditioning Mar 6, 2026

Copilot AI review requested due to automatic review settings March 6, 2026 08:09

Copilot AI reviewed Mar 6, 2026

View reviewed changes

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp Outdated Show resolved Hide resolved

src/cpp/include/openvino/genai/video_generation/text2video_pipeline.hpp Outdated Show resolved Hide resolved

src/cpp/src/video_generation/ltx_pipeline.hpp Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of Yash-Vijay29 March 6, 2026 08:49 View session

Yash-Vijay29 changed the title ~~logic for timestep_conditioning~~ Implement timestep_conditioning Mar 6, 2026

Yash-Vijay29 marked this pull request as ready for review March 6, 2026 11:43

Yash-Vijay29 requested review from sbalandi and yatarkan as code owners March 6, 2026 13:11

github-actions bot added the category: WWB PR changes WWB label Mar 6, 2026

Revert accidental openvino_tokenizers submodule pointer update

4243dcc

Copilot AI review requested due to automatic review settings March 26, 2026 08:34

Copilot AI reviewed Mar 26, 2026

View reviewed changes

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp Show resolved Hide resolved

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp Outdated Show resolved Hide resolved

src/python/py_video_generation_models.cpp Show resolved Hide resolved

github-actions bot removed the category: tokenizers Tokenizer class or submodule update label Mar 26, 2026

Copilot started reviewing on behalf of Yash-Vijay29 March 26, 2026 08:57 View session

as-suvorov assigned likholat Mar 26, 2026

Copilot AI review requested due to automatic review settings March 26, 2026 13:19

Yash-Vijay29 force-pushed the timestep_conditioning branch from c2bc0a0 to 4243dcc Compare March 26, 2026 13:19

wwb update

b0ab6dd

Copilot AI reviewed Mar 26, 2026

View reviewed changes

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp Show resolved Hide resolved

site/docs/use-cases/video-generation/_sections/_usage_options/index.mdx Outdated Show resolved Hide resolved

Yash-Vijay29 added 2 commits March 26, 2026 20:42

integer timestep support

9d3f95a

batch size strictness removal

9d334d6

Copilot AI reviewed Mar 26, 2026

View reviewed changes

Yash-Vijay29 added 2 commits March 26, 2026 21:38

fixing timestep input

c6155d8

loosen strictness

6a4a390

Copilot AI reviewed Mar 26, 2026

View reviewed changes

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp Show resolved Hide resolved

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp Show resolved Hide resolved

refactor name and fix wwb

70032dc

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Yash-Vijay29 added 2 commits March 27, 2026 22:24

Delete tools/who_what_benchmark/tests/test_video_similarity_metrics.py

0dab6b9

Update test_cli_videos.py

aec4d7e

Copilot AI reviewed Mar 27, 2026

View reviewed changes

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp Show resolved Hide resolved

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp Show resolved Hide resolved

site/docs/use-cases/video-generation/_sections/_usage_options/index.mdx Show resolved Hide resolved

Simplify video loading and evaluation logic

26645df

Removed error handling for video loading and skipped pairs tracking.

Copilot AI reviewed Mar 27, 2026

View reviewed changes

site/docs/use-cases/video-generation/_sections/_usage_options/index.mdx Show resolved Hide resolved

Yash-Vijay29 added 2 commits March 31, 2026 13:23

Merge branch 'master' into timestep_conditioning

1492eb4

Merge branch 'master' into timestep_conditioning

788234c

Copilot AI reviewed Apr 1, 2026

View reviewed changes

src/cpp/src/video_generation/models/autoencoder_kl_ltx_video.cpp Show resolved Hide resolved

tools/who_what_benchmark/tests/test_cli_videos.py Show resolved Hide resolved

Yash-Vijay29 added 2 commits April 2, 2026 14:05

Merge branch 'master' into timestep_conditioning

1f24c19

Merge branch 'master' into timestep_conditioning

5b7d77c

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Merge branch 'master' into timestep_conditioning

f25e0a5

Conversation

Yash-Vijay29 commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Yash-Vijay29 commented Feb 28, 2026 •

edited

Loading