[Model] add flux2 klein by david6666666 · Pull Request #809 · vllm-project/vllm-omni

david6666666 · 2026-01-16T04:04:05Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

vLLM-Omni:

python examples/offline_inference/text_to_image/text_to_image.py \
  --model black-forest-labs/FLUX.2-klein-9B \
  --prompt "a photo of a forest with mist swirling around the tree trunks. The word 'FLUX.2' is painted over it in big, red brush strokes with visible texture" \
  --seed 42 \
  --cfg_scale 4.0 \
  --num_images_per_prompt 1 \
  --num_inference_steps 4 \
  --guidance_scale 1.0 \
  --height 768 \
  --width 1360 \
  --output outputs/output_9b_omni.png

online serving:

vllm serve black-forest-labs/FLUX.2-klein-9B --omni --port 8091

curl -s http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "A beautiful landscape painting"}
    ],
    "extra_body": {
      "height": 1024,
      "width": 1024,
      "num_inference_steps": 4,
      "true_cfg_scale": 4.0,
      "seed": 42
    }
  }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > klein.png

Test Result

vLLM-Omni:

python examples/offline_inference/image_to_image/image_edit.py \
  --model black-forest-labs/FLUX.2-klein-9B \
  --image output_9b_omni.png \
  --prompt "replace the trees in the image with cyberpunk buildings." \
  --output output_image_edit.png \
  --num_inference_steps 4 \
  --cfg_scale 4.0

diffusers:

import torch
from diffusers import Flux2KleinPipeline

dtype = torch.bfloat16
device = "cuda"

height = 768
width = 1360
prompt = "a photo of a forest with mist swirling around the tree trunks. The word 'FLUX.2' is painted over it in big, r
ed brush strokes with visible texture"

repo_id = "black-forest-labs/FLUX.2-klein-9B"
pipeline = Flux2KleinPipeline.from_pretrained(repo_id, torch_dtype=dtype)
pipeline.to(device)
num_inference_steps = 4
guidance_scale = 1.0
generator = torch.Generator(device=device).manual_seed(42)

out = pipeline(
    prompt=prompt,
    height=height,
    width=width,
    num_inference_steps=num_inference_steps,
    guidance_scale=guidance_scale,
    generator=generator
).images[0]

out.save("output_9b.png")

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: David Chen <530634352@qq.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 44f82a7549

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-16T04:13:23Z

vllm_omni/diffusion/models/flux2_klein/pipeline_flux2_klein.py

+        if self.do_classifier_free_guidance:
+            negative_prompt = ""
+            if prompt is not None and isinstance(prompt, list):
+                negative_prompt = [negative_prompt] * len(prompt)
+            negative_prompt_embeds, negative_text_ids = self.encode_prompt(


Honor negative_prompt when building CFG embeddings

In Flux2KleinPipeline.forward, the CFG branch always sets negative_prompt to an empty string and never reads req.negative_prompt, so any user-provided negative prompt is silently ignored. This means CFG outputs cannot reflect requested negative constraints (e.g., prompts to avoid artifacts), which can change or invalidate experiment results. Consider threading negative_prompt from the request/args (as other pipelines do) and using it when encoding the negative prompt embeddings.

Useful? React with 👍 / 👎.

Signed-off-by: David Chen <530634352@qq.com>

david6666666 · 2026-01-16T05:31:56Z

@ZJY0516 @SamitHuang @hsliuustc0106 PTAL thanks.

ZJY0516

thanks for contribution

ZJY0516 · 2026-01-16T05:34:08Z

vllm_omni/diffusion/models/flux2_klein/flux2_klein_transformer.py

+            query = apply_rotary_emb(query, image_rotary_emb, sequence_dim=1)
+            key = apply_rotary_emb(key, image_rotary_emb, sequence_dim=1)
+
+        attn_output = _scaled_dot_product_attention(query, key, value, attention_mask=attention_mask)


please use attention layer in vllm_omni/diffusion

ZJY0516 · 2026-01-16T05:34:27Z

vllm_omni/diffusion/models/flux2_klein/flux2_klein_transformer.py

+            value = torch.cat([encoder_value, value], dim=1)
+
+        if image_rotary_emb is not None:
+            query = apply_rotary_emb(query, image_rotary_emb, sequence_dim=1)


we have rope layer in vllm-omni

ZJY0516 · 2026-01-16T05:35:34Z

vllm_omni/diffusion/request.py


    # Additional text-related parameters
    max_sequence_length: int | None = None
+    text_encoder_out_layers: tuple[int, ...] | None = None


why we need this?

ZJY0516 · 2026-01-16T05:37:28Z

vllm_omni/diffusion/cache/cache_dit_backend.py

    return refresh_cache_context


+def enable_cache_for_flux2_klein(pipeline: Any, cache_config: Any) -> Callable[[int], None]:


if cache-dit has already supported it, do we still need this? @SamitHuang

yes because it's not a regular dit. but we need to test it. @david6666666

Signed-off-by: David Chen <530634352@qq.com>

SamitHuang · 2026-01-16T06:15:50Z

vllm_omni/diffusion/models/flux2_klein/pipeline_flux2_klein.py

+
+
+# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.retrieve_latents
+def retrieve_latents(


why not just import

SamitHuang · 2026-01-16T06:16:21Z

vllm_omni/diffusion/models/flux2_klein/pipeline_flux2_klein.py

+
+
+# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.retrieve_timesteps
+def retrieve_timesteps(


Signed-off-by: David Chen <530634352@qq.com>

SamitHuang · 2026-01-16T07:12:17Z

vllm_omni/diffusion/models/flux2_klein/flux2_klein_transformer.py

+        self.dropout = dropout
+        self.added_kv_proj_dim = added_kv_proj_dim
+
+        self.to_q = nn.Linear(query_dim, self.inner_dim, bias=bias)


why not use from vllm.model_executor.layers.linear import QKVParallelLinear, ReplicatedLinear?

SamitHuang · 2026-01-16T07:12:49Z

vllm_omni/diffusion/models/flux2_klein/flux2_klein_transformer.py

+        self.to_out = nn.ModuleList([nn.Linear(self.inner_dim, self.out_dim, bias=out_bias), nn.Dropout(dropout)])
+
+        if added_kv_proj_dim is not None:
+            self.norm_added_q = nn.RMSNorm(dim_head, eps=eps)


why not use RMSNorm from vllm?

No need to use norm op in vllm here. native torch op + torch.compile will be better

vllm RMSNorm doesn't support torch.compile?

It does support. But the dispatch logic in vLLM is a little different with omni diffusion's custom op. So I think decounpling will better. And in most cases, vllm doesn't use custom kernel of RMSNorm

I test native torch op + torch.compile is better

hsliuustc0106 · 2026-01-16T07:15:11Z

in terms of the seed, I remember @SamitHuang changed the default seed to 0 as it will influence some acc

SamitHuang · 2026-01-16T07:19:18Z

in terms of the seed, I remember @SamitHuang changed the default seed to 0 as it will influence some acc

yes, but it doesn't influence this PR

Signed-off-by: David Chen <530634352@qq.com>

hsliuustc0106 · 2026-01-16T07:54:25Z

any benchmark result for latency?

Signed-off-by: David Chen <530634352@qq.com>

david6666666 · 2026-01-16T08:05:19Z

any benchmark result for latency?

H800 9B 0.8s, 4B 0.5s

david6666666 · 2026-01-16T08:06:34Z

PR is ready

david6666666 · 2026-01-16T08:27:52Z

vllm-omni-amd-ci failed is not related

Signed-off-by: David Chen <530634352@qq.com>

Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: Chen Yang <2082464740@qq.com> # Conflicts: # vllm_omni/diffusion/registry.py

jannikstdl · 2026-01-20T07:41:14Z

Will is someday support Image editing in the API?

ZJY0516 · 2026-01-20T07:47:25Z

Will is someday support Image editing in the API?

I think we already support online seving for image edit, see https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/image_to_image

jannikstdl · 2026-01-20T07:53:55Z

@ZJY0516 Uhm ok, will try in the latest VLLM Omni Docker Image

Could not find anything regarding to that in the docs.

kolmogorov-quyet · 2026-01-27T02:40:20Z

They don't work with parallelism acceleration. Can anyone write a code that supports one of the four Ulysses-SP, Ring-SP, CFG-Parallel, or Tensor-Parallel architectures? I've tried one H2O GPU vs two H2O GPUs and the speed remains the same. Model FLUX.2-klein-9B, tested --usp 2, --ring 2(hybrid), Tensor-Parallel

hsliuustc0106 · 2026-01-27T02:44:16Z

They don't work with parallelism acceleration. Can anyone write a code that supports one of the four Ulysses-SP, Ring-SP, CFG-Parallel, or Tensor-Parallel architectures? I've tried one H2O GPU vs two H2O GPUs and the speed remains the same. Model FLUX.2-klein-9B, tested --usp 2, --ring 2(hybrid), Tensor-Parallel

@wtomin @ZJY0516 PTAL

ZJY0516 · 2026-01-27T05:43:32Z

They don't work with parallelism acceleration. Can anyone write a code that supports one of the four Ulysses-SP, Ring-SP, CFG-Parallel, or Tensor-Parallel architectures? I've tried one H2O GPU vs two H2O GPUs and the speed remains the same. Model FLUX.2-klein-9B, tested --usp 2, --ring 2(hybrid), Tensor-Parallel

Could you please try #973?

add flux2 klein

44f82a7

Signed-off-by: David Chen <530634352@qq.com>

david6666666 requested a review from hsliuustc0106 as a code owner January 16, 2026 04:04

chatgpt-codex-connector bot reviewed Jan 16, 2026

View reviewed changes

david6666666 added 4 commits January 16, 2026 12:30

add flux2 klein cache dit

30ca89f

Signed-off-by: David Chen <530634352@qq.com>

add flux2 klein doc

f3a08e9

Signed-off-by: David Chen <530634352@qq.com>

fix pre-commit

7719150

Signed-off-by: David Chen <530634352@qq.com>

fix pre-commit

4a21203

Signed-off-by: David Chen <530634352@qq.com>

david6666666 added the ready label to trigger buildkite CI label Jan 16, 2026

fix pre-commit

0579b05

Signed-off-by: David Chen <530634352@qq.com>

ZJY0516 reviewed Jan 16, 2026

View reviewed changes

ZJY0516 requested a review from SamitHuang January 16, 2026 05:37

fix comment

3f3ef12

Signed-off-by: David Chen <530634352@qq.com>

SamitHuang reviewed Jan 16, 2026

View reviewed changes

david6666666 added 2 commits January 16, 2026 14:38

fix comment

ad5b0cd

Signed-off-by: David Chen <530634352@qq.com>

fix comment

d531924

Signed-off-by: David Chen <530634352@qq.com>

david6666666 force-pushed the flux2_klein branch from 0f2f766 to d531924 Compare January 16, 2026 07:03

SamitHuang reviewed Jan 16, 2026

View reviewed changes

david6666666 added 2 commits January 16, 2026 15:45

fix comment 2

38f03ec

Signed-off-by: David Chen <530634352@qq.com>

fix pre-commit

982e758

Signed-off-by: David Chen <530634352@qq.com>

fix bug

d135115

Signed-off-by: David Chen <530634352@qq.com>

ZJY0516 approved these changes Jan 16, 2026

View reviewed changes

SamitHuang approved these changes Jan 16, 2026

View reviewed changes

hsliuustc0106 merged commit a7f9926 into vllm-project:main Jan 16, 2026
6 of 7 checks passed

erfgss pushed a commit to erfgss/vllm-omni that referenced this pull request Jan 19, 2026

[Model] add flux2 klein (vllm-project#809)

a34d4a9

Signed-off-by: David Chen <530634352@qq.com>

david6666666 mentioned this pull request Jan 19, 2026

[model]Add UltraFlux-v1-image support #611

Open

5 tasks

erfgss added a commit to erfgss/vllm-omni that referenced this pull request Jan 19, 2026

[Model] add flux2 klein (vllm-project#809)

e45994c

Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: Chen Yang <2082464740@qq.com> # Conflicts: # vllm_omni/diffusion/registry.py

david6666666 mentioned this pull request Jan 20, 2026

vLLM-Omni Model Support #808

Open

55 tasks

david6666666 deleted the flux2_klein branch January 22, 2026 01:57

		return refresh_cache_context


		def enable_cache_for_flux2_klein(pipeline: Any, cache_config: Any) -> Callable[[int], None]:



		# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.retrieve_latents
		def retrieve_latents(



		# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.retrieve_timesteps
		def retrieve_timesteps(

Comments

Conversation

david6666666 commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

david6666666 commented Jan 16, 2026

Uh oh!

ZJY0516 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Jan 16, 2026

Uh oh!

SamitHuang commented Jan 16, 2026

Uh oh!

hsliuustc0106 commented Jan 16, 2026

Uh oh!

david6666666 commented Jan 16, 2026

Uh oh!

david6666666 commented Jan 16, 2026

Uh oh!

david6666666 commented Jan 16, 2026

Uh oh!

Uh oh!

jannikstdl commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZJY0516 commented Jan 20, 2026

Uh oh!

jannikstdl commented Jan 20, 2026

Uh oh!

kolmogorov-quyet commented Jan 27, 2026

Uh oh!

hsliuustc0106 commented Jan 27, 2026

Uh oh!

ZJY0516 commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

david6666666 commented Jan 16, 2026 •

edited

Loading

jannikstdl commented Jan 20, 2026 •

edited

Loading