[Model] add flux2 klein#809
Conversation
Signed-off-by: David Chen <530634352@qq.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 44f82a7549
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if self.do_classifier_free_guidance: | ||
| negative_prompt = "" | ||
| if prompt is not None and isinstance(prompt, list): | ||
| negative_prompt = [negative_prompt] * len(prompt) | ||
| negative_prompt_embeds, negative_text_ids = self.encode_prompt( |
There was a problem hiding this comment.
Honor negative_prompt when building CFG embeddings
In Flux2KleinPipeline.forward, the CFG branch always sets negative_prompt to an empty string and never reads req.negative_prompt, so any user-provided negative prompt is silently ignored. This means CFG outputs cannot reflect requested negative constraints (e.g., prompts to avoid artifacts), which can change or invalidate experiment results. Consider threading negative_prompt from the request/args (as other pipelines do) and using it when encoding the negative prompt embeddings.
Useful? React with 👍 / 👎.
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
|
@ZJY0516 @SamitHuang @hsliuustc0106 PTAL thanks. |
ZJY0516
left a comment
There was a problem hiding this comment.
thanks for contribution
| query = apply_rotary_emb(query, image_rotary_emb, sequence_dim=1) | ||
| key = apply_rotary_emb(key, image_rotary_emb, sequence_dim=1) | ||
|
|
||
| attn_output = _scaled_dot_product_attention(query, key, value, attention_mask=attention_mask) |
There was a problem hiding this comment.
please use attention layer in vllm_omni/diffusion
| value = torch.cat([encoder_value, value], dim=1) | ||
|
|
||
| if image_rotary_emb is not None: | ||
| query = apply_rotary_emb(query, image_rotary_emb, sequence_dim=1) |
There was a problem hiding this comment.
we have rope layer in vllm-omni
vllm_omni/diffusion/request.py
Outdated
|
|
||
| # Additional text-related parameters | ||
| max_sequence_length: int | None = None | ||
| text_encoder_out_layers: tuple[int, ...] | None = None |
| return refresh_cache_context | ||
|
|
||
|
|
||
| def enable_cache_for_flux2_klein(pipeline: Any, cache_config: Any) -> Callable[[int], None]: |
There was a problem hiding this comment.
if cache-dit has already supported it, do we still need this? @SamitHuang
There was a problem hiding this comment.
yes because it's not a regular dit. but we need to test it. @david6666666
Signed-off-by: David Chen <530634352@qq.com>
|
|
||
|
|
||
| # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.retrieve_latents | ||
| def retrieve_latents( |
|
|
||
|
|
||
| # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.retrieve_timesteps | ||
| def retrieve_timesteps( |
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
0f2f766 to
d531924
Compare
| self.dropout = dropout | ||
| self.added_kv_proj_dim = added_kv_proj_dim | ||
|
|
||
| self.to_q = nn.Linear(query_dim, self.inner_dim, bias=bias) |
There was a problem hiding this comment.
why not use from vllm.model_executor.layers.linear import QKVParallelLinear, ReplicatedLinear?
| self.to_out = nn.ModuleList([nn.Linear(self.inner_dim, self.out_dim, bias=out_bias), nn.Dropout(dropout)]) | ||
|
|
||
| if added_kv_proj_dim is not None: | ||
| self.norm_added_q = nn.RMSNorm(dim_head, eps=eps) |
There was a problem hiding this comment.
why not use RMSNorm from vllm?
There was a problem hiding this comment.
No need to use norm op in vllm here. native torch op + torch.compile will be better
There was a problem hiding this comment.
vllm RMSNorm doesn't support torch.compile?
There was a problem hiding this comment.
It does support. But the dispatch logic in vLLM is a little different with omni diffusion's custom op. So I think decounpling will better. And in most cases, vllm doesn't use custom kernel of RMSNorm
There was a problem hiding this comment.
I test native torch op + torch.compile is better
|
in terms of the seed, I remember @SamitHuang changed the default seed to 0 as it will influence some acc |
yes, but it doesn't influence this PR |
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com>
|
any benchmark result for latency? |
H800 9B 0.8s, 4B 0.5s |
|
PR is ready |
|
vllm-omni-amd-ci failed is not related |
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: Chen Yang <2082464740@qq.com> # Conflicts: # vllm_omni/diffusion/registry.py
|
Will is someday support Image editing in the API? |
I think we already support online seving for image edit, see https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/image_to_image |
|
@ZJY0516 Uhm ok, will try in the latest VLLM Omni Docker Image Could not find anything regarding to that in the docs. |
|
They don't work with parallelism acceleration. Can anyone write a code that supports one of the four Ulysses-SP, Ring-SP, CFG-Parallel, or Tensor-Parallel architectures? I've tried one H2O GPU vs two H2O GPUs and the speed remains the same. Model FLUX.2-klein-9B, tested --usp 2, --ring 2(hybrid), Tensor-Parallel |
|
Could you please try #973? |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
support black-forest-labs/FLUX.2-klein
Test Plan
vLLM-Omni:
online serving:
Test Result
vLLM-Omni:

diffusers:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)