Skip to content

Comments

[model]Add UltraFlux-v1-image support#611

Open
erfgss wants to merge 58 commits intovllm-project:mainfrom
erfgss:feat/UltraFlux-v1-image
Open

[model]Add UltraFlux-v1-image support#611
erfgss wants to merge 58 commits intovllm-project:mainfrom
erfgss:feat/UltraFlux-v1-image

Conversation

@erfgss
Copy link
Contributor

@erfgss erfgss commented Jan 4, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add UltraFlux-v1-image support #327

Test Plan

python text_to_image.py \
  --model Owen777/UltraFlux-v1 \
  --prompt "A vast rocky landscape dominated by towering, weathered stone formations, bathed in the ethereal glow of a vibrant night sky filled with a sea of stars, the Milky Way stretching across the heavens, captured from a low angle to emphasize the immense scale of the rocks against the expansive cosmos above. The scene is illuminated by soft, cool moonlight, casting long, dramatic shadows on the textured rock surfaces. The color palette is rich with deep blues, purples, and silvery whites, creating a serene, otherworldly atmosphere." \
  --height 4096 \
  --width 4096 \
  --output UltraFlux-v1_image_output.png \
  --cache_backend cache_dit

Test Result

without cache_dit

Processed prompts: 100%|██████████| 1/1 [02:00<00:00, 120.95s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-04 03:05:42 [omni.py:687] [Summary] {'e2e_requests': 1,mg, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-04 03:05:42 [omni.py:687]  'e2e_total_time_ms': 120952.95739173889,
INFO 01-04 03:05:42 [omni.py:687]  'e2e_sum_time_ms': 120951.51662826538,
INFO 01-04 03:05:42 [omni.py:687]  'e2e_total_tokens': 0,
INFO 01-04 03:05:42 [omni.py:687]  'e2e_avg_time_per_request_ms': 120951.51662826538,
INFO 01-04 03:05:42 [omni.py:687]  'e2e_avg_tokens_per_s': 0.0,
INFO 01-04 03:05:42 [omni.py:687]  'wall_time_ms': 120952.95739173889,
INFO 01-04 03:05:42 [omni.py:687]  'final_stage_id': {'0_5208ad13-e972-40b0-b30a-45fbac8b7d4e': 0},
INFO 01-04 03:05:42 [omni.py:687]  'stages': [{'stage_id': 0,
INFO 01-04 03:05:42 [omni.py:687]              'requests': 1,
INFO 01-04 03:05:42 [omni.py:687]              'tokens': 0,
INFO 01-04 03:05:42 [omni.py:687]              'total_time_ms': 120951.89571380615,
INFO 01-04 03:05:42 [omni.py:687]              'avg_time_per_request_ms': 120951.89571380615,
INFO 01-04 03:05:42 [omni.py:687]              'avg_tokens_per_s': 0.0}],
INFO 01-04 03:05:42 [omni.py:687]  'transfers': []}
Adding requests:   0%|          | 0/1 [02:00<?, ?it/s]
[Stage-0] ERROR 01-04 03:05:42 [omni_stage.py:636] Received shutdown signal
[Stage-0] INFO 01-04 03:05:42 [gpu_worker.py:265] Worker 0: Received shutdown message
[Stage-0] INFO 01-04 03:05:42 [gpu_worker.py:287] event loop terminated.
[Stage-0] INFO 01-04 03:05:43 [gpu_worker.py:318] Worker 0: Shutdown complete.
INFO 01-04 03:05:46 [text_to_image.py:168] Outputs: [OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='image', request_output=[OmniRequestOutput(request_id='0_5208ad13-e972-40b0-b30a-45fbac8b7d4e', finished=True, stage_id=None, final_output_type='image', request_output=None, images=[1 PIL Images], prompt='A vast rocky landscape dominated by towering, weathered stone formations, bathed in the ethereal glow of a vibrant night sky filled with a sea of stars, the Milky Way stretching across the heavens, captured from a low angle to emphasize the immense scale of the rocks against the expansive cosmos above. The scene is illuminated by soft, cool moonlight, casting long, dramatic shadows on the textured rock surfaces. The color palette is rich with deep blues, purples, and silvery whites, creating a serene, otherworldly atmosphere.', latents=None, metrics={})], images=[], prompt=None, latents=None, metrics={})]
Saved generated image to UltraFlux-v1_image_output.png

WITHOUTCACHE

with cache_dit

Processed prompts: 100%|██████████| 1/1 [00:43<00:00, 43.56s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-04 03:09:29 [omni.py:687] [Summary] {'e2e_requests': 1,g, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-04 03:09:29 [omni.py:687]  'e2e_total_time_ms': 43562.40963935852,
INFO 01-04 03:09:29 [omni.py:687]  'e2e_sum_time_ms': 43560.83941459656,
INFO 01-04 03:09:29 [omni.py:687]  'e2e_total_tokens': 0,
INFO 01-04 03:09:29 [omni.py:687]  'e2e_avg_time_per_request_ms': 43560.83941459656,
INFO 01-04 03:09:29 [omni.py:687]  'e2e_avg_tokens_per_s': 0.0,
INFO 01-04 03:09:29 [omni.py:687]  'wall_time_ms': 43562.40963935852,
INFO 01-04 03:09:29 [omni.py:687]  'final_stage_id': {'0_81384414-7d7f-4871-ace1-48322e07f1f2': 0},
INFO 01-04 03:09:29 [omni.py:687]  'stages': [{'stage_id': 0,
INFO 01-04 03:09:29 [omni.py:687]              'requests': 1,
INFO 01-04 03:09:29 [omni.py:687]              'tokens': 0,
INFO 01-04 03:09:29 [omni.py:687]              'total_time_ms': 43561.26618385315,
INFO 01-04 03:09:29 [omni.py:687]              'avg_time_per_request_ms': 43561.26618385315,
INFO 01-04 03:09:29 [omni.py:687]              'avg_tokens_per_s': 0.0}],
INFO 01-04 03:09:29 [omni.py:687]  'transfers': []}
Adding requests:   0%|          | 0/1 [00:43<?, ?it/s]
[Stage-0] ERROR 01-04 03:09:29 [omni_stage.py:636] Received shutdown signal
[Stage-0] INFO 01-04 03:09:29 [gpu_worker.py:265] Worker 0: Received shutdown message
[Stage-0] INFO 01-04 03:09:29 [gpu_worker.py:287] event loop terminated.
[Stage-0] INFO 01-04 03:09:29 [gpu_worker.py:318] Worker 0: Shutdown complete.
INFO 01-04 03:09:32 [text_to_image.py:168] Outputs: [OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='image', request_output=[OmniRequestOutput(request_id='0_81384414-7d7f-4871-ace1-48322e07f1f2', finished=True, stage_id=None, final_output_type='image', request_output=None, images=[1 PIL Images], prompt='A vast rocky landscape dominated by towering, weathered stone formations, bathed in the ethereal glow of a vibrant night sky filled with a sea of stars, the Milky Way stretching across the heavens, captured from a low angle to emphasize the immense scale of the rocks against the expansive cosmos above. The scene is illuminated by soft, cool moonlight, casting long, dramatic shadows on the textured rock surfaces. The color palette is rich with deep blues, purples, and silvery whites, creating a serene, otherworldly atmosphere.', latents=None, metrics={})], images=[], prompt=None, latents=None, metrics={})]
Saved generated image to UltraFlux-v1_image_output.png

CACHE

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@erfgss erfgss requested a review from hsliuustc0106 as a code owner January 4, 2026 09:29
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7af3b2287a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 44 to 46
"pipeline_wan2_2",
"Wan22Pipeline",
),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve existing diffusion model registrations

This registry block now omits several existing pipelines (e.g., WanImageToVideoPipeline, BagelPipeline, LongCatImageEditPipeline, StableDiffusion3Pipeline) even though their modules still exist under vllm_omni/diffusion/models. Since initialize_model looks up od_config.model_class_name in _DIFFUSION_MODELS and raises a ValueError when missing, any config that previously used those model_class_name values will now fail to initialize. Please restore those entries or explicitly deprecate them across docs/configs.

Useful? React with 👍 / 👎.

Comment on lines 52 to 56
#UltraFlux
"FluxPipeline": (
"ultraflux-v1_image",
"pipeline_ultraflux",
"UltraFluxPipeline",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Register UltraFlux under UltraFluxPipeline key

The new UltraFlux entry is keyed as FluxPipeline, but the actual class and docs use UltraFluxPipeline. If users set model_class_name=UltraFluxPipeline as documented, _DIFFUSION_MODELS lookup will fail and initialize_model will raise “Model class UltraFluxPipeline not found.” Please register it under UltraFluxPipeline (or update naming consistently) so the model is loadable.

Useful? React with 👍 / 👎.

{
"WanPipeline": enable_cache_for_wan22,
"FluxPipeline": enable_cache_for_flux,
"UltraFluxPipeline": enable_cache_for_flux,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you need to update new func name such as enable_cache_for_ultraflux

"pipeline_wan2_2",
"Wan22Pipeline",
),
"WanImageToVideoPipeline": (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not delete, just add your pipeline

"StableDiffusion3Pipeline",
),
#UltraFlux
"FluxPipeline": (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename UltraFluxPipeline

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

od_config: OmniDiffusionConfig,
):
model_class = DiffusionModelRegistry._try_load_model_cls(od_config.model_class_name)
print("DEBUG model_class =", model_class)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

"LongCatImagePipeline",
),
"BagelPipeline": (
"BagelPipeline": (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert here

# where mod_folder and mod_relname are defined and mapped using `_DIFFUSION_MODELS` via the `arch` key
"QwenImageEditPipeline": "get_qwen_image_edit_pre_process_func",
"QwenImageEditPlusPipeline": "get_qwen_image_edit_plus_pre_process_func",
"QwenImageLayeredPipeline": "get_qwen_image_layered_pre_process_func",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

@erfgss
Copy link
Contributor Author

erfgss commented Jan 6, 2026

@SamitHuang

@@ -4,7 +4,7 @@
import multiprocessing as mp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a model should not change diffusion_engine

| `WanPipeline` | Wan2.2-T2V, Wan2.2-TI2V | `Wan-AI/Wan2.2-T2V-A14B-Diffusers`, `Wan-AI/Wan2.2-TI2V-5B-Diffusers` |
| `WanImageToVideoPipeline` | Wan2.2-I2V | `Wan-AI/Wan2.2-I2V-A14B-Diffusers` |
| `OvisImagePipeline` | Ovis-Image | `OvisAI/Ovis-Image` |
|`LongcatImagePipeline` | LongCat-Image | `meituan-longcat/LongCat-Image` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkdocs.yml Outdated
- "vllm_omni.entrypoints.async_diffusion" # avoid importing vllm in mkdocs building
- "vllm_omni.entrypoints.openai" # avoid importing vllm in mkdocs building
- "vllm_omni.entrypoints.openai.protocol" # avoid importing vllm in mkdocs building
- "vllm_omni.entrypoints.omni" # avoid importing vllm in mkdocs building
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to change this?


# Summarize and print stats
try:
import json as _json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to change this? @Bounty-hunter PTAL

@@ -4,6 +4,7 @@
from dataclasses import dataclass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is designed for adding a model, you should not make any changes to these comment files

def _initialize_stages(self, model: str, kwargs: dict[str, Any]) -> None:
"""Initialize stage list management."""
stage_init_timeout = kwargs.get("stage_init_timeout", 20)
# Diffusion/large models can take long to load; align default with CLI (300s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should not change this default value, you need to provide this in your cli

pass
logger.debug("Engine initialized")

# Check if stage engine supports profiling (via vLLM's built-in profiler)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you need change omni_stage in the model support PR?

}
)

return result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not change in this PR

@erfgss erfgss force-pushed the feat/UltraFlux-v1-image branch from 560c1a1 to 448588b Compare January 13, 2026 08:54
@david6666666 david6666666 mentioned this pull request Jan 16, 2026
57 tasks
"StableDiffusion3Pipeline",
),
#UltraFlux
"FluxPipeline": (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

@@ -0,0 +1,931 @@
# Copyright 2025 Black Forest Labs, The HuggingFace Team and The InstantX Team. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add support to TP please check #735

| `WanImageToVideoPipeline` | Wan2.2-I2V | `Wan-AI/Wan2.2-I2V-A14B-Diffusers` |
| `OvisImagePipeline` | Ovis-Image | `OvisAI/Ovis-Image` |
|`LongcatImagePipeline` | LongCat-Image | `meituan-longcat/LongCat-Image` |
|`LongCatImageEditPipeline` | LongCat-Image-Edit | `meituan-longcat/LongCat-Image-Edit` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also update diffusion acceleration md for cache dit support

@hsliuustc0106
Copy link
Collaborator

@david6666666 can we not use benchmark serving under benchmarks folder for t2i jobs

@david6666666
Copy link
Collaborator

Please make similar modifications based on the review comments in #809.

  • please use attention layer in vllm_omni/diffusion.

  • we have rope layer in vllm-omni.

  • Just import to reduce copying local functions from.

  • use from vllm.model_executor.layers.layernorm import RMSNorm

  • use from vllm.use from vllm.model_executor.layers.linear import QKVParallelLinear, ReplicatedLinear

erfgss and others added 18 commits January 19, 2026 10:58
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
…in Ring Attention (vllm-project#767)

Signed-off-by: XU Mingshi <[email protected]>
Signed-off-by: mxuax <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: David Chen <[email protected]>
Signed-off-by: Chen Yang <[email protected]>

# Conflicts:
#	vllm_omni/diffusion/registry.py
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
@erfgss erfgss force-pushed the feat/UltraFlux-v1-image branch from 8fc9d02 to 67a48fb Compare January 19, 2026 03:18
@erfgss
Copy link
Contributor Author

erfgss commented Jan 19, 2026

Please make similar modifications based on the review comments in #809.

  • please use attention layer in vllm_omni/diffusion.
  • we have rope layer in vllm-omni.
  • Just import to reduce copying local functions from.
  • use from vllm.model_executor.layers.layernorm import RMSNorm
  • use from vllm.use from vllm.model_executor.layers.linear import QKVParallelLinear, ReplicatedLinear

@erfgss
Copy link
Contributor Author

erfgss commented Jan 19, 2026

Please make similar modifications based on the review comments in #809.

  • please use attention layer in vllm_omni/diffusion.
  • we have rope layer in vllm-omni.
  • Just import to reduce copying local functions from.
  • use from vllm.model_executor.layers.layernorm import RMSNorm
  • use from vllm.use from vllm.model_executor.layers.linear import QKVParallelLinear, ReplicatedLinear

Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
@erfgss erfgss requested a review from david6666666 January 19, 2026 08:18
@david6666666
Copy link
Collaborator

david6666666 commented Feb 4, 2026

  • Support SP(ulysses, ring)
  • Support TP
  • Support CFG parallel
  • validate Cache-DiT

Copy link
Contributor

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution — left a few comments inline on things I noticed.

|`Qwen3TTSForConditionalGeneration` | Qwen3-TTS-12Hz-1.7B-VoiceDesign | `Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign` |
|`Qwen3TTSForConditionalGeneration` | Qwen3-TTS-12Hz-1.7B-Base | `Qwen/Qwen3-TTS-12Hz-0.6B-Base` |

|`UltraFluxPipeline` | UltraFlux-v1 | `Owen777/UltraFlux-v1` |
Copy link
Contributor

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this diff might have accidentally removed the three Qwen3-TTS entries — probably a rebase artifact? The UltraFlux line should be added alongside them.

self.default_sample_size = 64

print(self.vae.config)
print(self.transformer.config)
Copy link
Contributor

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few debug print() calls here — might want to swap them for logger.debug() or remove before merging.

scheduler=scheduler,
)

self.vae_scale_factor = 32
Copy link
Contributor

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question — vae_scale_factor = 32 while standard Flux uses 16. Is 32 correct for UltraFlux, or a copy-paste from somewhere? If it's intentional, a brief comment explaining why would be helpful.

def _get_fused_projections(attn: "FluxAttention", hidden_states, encoder_hidden_states=None):
query, key, value = attn.to_qkv(hidden_states).chunk(3, dim=-1)

encoder_query = encoder_key = encoder_value = (None,)
Copy link
Contributor

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a small issue here — (None,) creates a 1-tuple rather than assigning None to all three variables. This could cause problems downstream when calling .unflatten(...) on a tuple. Maybe:

encoder_query = encoder_key = encoder_value = None

mscale = torch.where(
scale <= 1.0, torch.tensor(1.0, device=scale.device, dtype=scale.dtype), 0.1 * torch.log(scale) + 1.0
)
mscale = torch.where(
Copy link
Contributor

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this mscale computation is a duplicate of lines 612-614. Probably a copy-paste leftover — the second one could be removed.

self.added_kv_proj_dim = added_kv_proj_dim
self.added_proj_bias = added_proj_bias

self.norm_q = torch.nn.RMSNorm(dim_head, eps=eps, elementwise_affine=elementwise_affine)
Copy link
Contributor

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw in the earlier review that the maintainer suggested using from vllm.model_executor.layers.layernorm import RMSNorm instead of torch.nn.RMSNorm. Looks like it still needs to be updated here and on lines 300, 311, 312.

for tok_name in ("tokenizer", "tokenizer_2"):
tok = getattr(self.pipe, tok_name, None)
if tok is not None and hasattr(tok, "model_max_length"):
tok.model_max_length = 512
Copy link
Contributor

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering — hardcoding tok.model_max_length = 512 overrides whatever the tokenizer originally had. Would it make sense to read this from the model config instead?

"LongCatImagePipeline": enable_cache_for_longcat_image,
"LongCatImageEditPipeline": enable_cache_for_longcat_image,
"UltraFluxPipeline": enable_cache_for_ultraflux,
"LongcatImagePipeline": enable_cache_for_longcat_image,
Copy link
Contributor

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LongCatImagePipeline -> LongcatImagePipeline rename seems like a separate change. Might be worth mentioning in the PR description, or splitting it out if you prefer?

self.tokenizer_max_length = (
self.tokenizer.model_max_length if hasattr(self, "tokenizer") and self.tokenizer is not None else 77
)
self.default_sample_size = 64
Copy link
Contributor

@lishunyang12 lishunyang12 Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With default_sample_size = 64 and vae_scale_factor = 32, the default resolution would be 2048x2048, but the test plan uses 4096x4096. Is 2048 the intended default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.