Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion vllm_omni/diffusion/worker/diffusion_model_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,13 @@ def execute_model(self, req: OmniDiffusionRequest) -> DiffusionOutput:
self.kv_transfer_manager.receive_kv_cache(req, target_device=getattr(self.pipeline, "device", None))

if req.sampling_params.generator is None and req.sampling_params.seed is not None:
req.sampling_params.generator = torch.Generator(device=self.device).manual_seed(req.sampling_params.seed)
if req.sampling_params.generator_device is not None:
gen_device = req.sampling_params.generator_device
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think users shouldn't be aware of the backend hardware model; it's very confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I get your concerns. We can only give users two choices "device" and "host". How about it? I think it's necessary to give users to choose cpu generator. Actually, for our current offline examples, we have exposed the generator to users. I think it's better to remove req.sampling_params.generator and only keep device_str that only includes device and host str. And we write the dispatch in model runner.

generator = torch.Generator(device=current_omni_platform.device_type).manual_seed(args.seed)

Copy link
Collaborator

@david6666666 david6666666 Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

req.sampling_params.generator is reasonable, generator_device is confusing I mean.

Copy link
Collaborator Author

@gcanlin gcanlin Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

req.sampling_params.generator is reasonable, generator_device is confusing I mean.

In offline, users can choose their generator. But in online, users can't do anything, which is this PR doing. We should keep consistent between offline and online at least.

We will face the problem that in online server, how can I set cpu generator? But in offline inference, we can easily set the code below.

generator = torch.Generator(device=current_omni_platform.device_type).manual_seed(args.seed)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, make sense in online serving,but does vllm has such a param we can refer, or can we change a param name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SGLang-Diffusion, it's also generator_device actually. Seems that vLLM doesn't have the similar parameter. For now, I think we can keep this parameter temporarily to make online and offline consistent only. And in a following-up PR, unify generator_device and generator.
https://github.com/sgl-project/sglang/blob/c1d529c19605cbf1f9be8db6d6d225b1465ea2e0/python/sglang/multimodal_gen/runtime/entrypoints/openai/image_api.py#L260

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david6666666 Could you please take a another look? I have added the test result.

elif self.device.type == "cpu":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why not just gen_device = self.device?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to give users the ability to choose the generator on host or device. If we directly use gen_device = self.device, then it will always be device. On online scenario, users can't pass the whole generator so we need to init generator in server side. Adding this field will help users init cpu generator even if they run the model in GPU.

gen_device = "cpu"
else:
gen_device = self.device
req.sampling_params.generator = torch.Generator(device=gen_device).manual_seed(req.sampling_params.seed)

# Refresh cache context if needed
if (
Expand Down
3 changes: 3 additions & 0 deletions vllm_omni/entrypoints/openai/api_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -969,6 +969,7 @@ async def generate_images(request: ImageGenerationRequest, raw_request: Request)
# This fixes issues where using the default global generator
# might produce blurry images in some environments.
_update_if_not_none(gen_params, "seed", random.randint(0, 2**32 - 1) if request.seed is None else request.seed)
_update_if_not_none(gen_params, "generator_device", request.generator_device)

request_id = f"img_gen_{uuid.uuid4().hex}"

Expand Down Expand Up @@ -1045,6 +1046,7 @@ async def edit_images(
guidance_scale: float | None = Form(None),
true_cfg_scale: float | None = Form(None),
seed: int | None = Form(None),
generator_device: str | None = Form("cpu"),
# vllm-omni extension for per-request LoRA.
lora: str | None = Form(None), # Json string
) -> ImageGenerationResponse:
Expand Down Expand Up @@ -1127,6 +1129,7 @@ async def edit_images(
# This fixes issues where using the default global generator
# might produce blurry images in some environments.
_update_if_not_none(gen_params, "seed", seed or random.randint(0, 2**32 - 1))
_update_if_not_none(gen_params, "generator_device", generator_device)

# 4. Generate images using AsyncOmni (multi-stage mode)
request_id = f"img_edit_{int(time.time())}"
Expand Down
4 changes: 4 additions & 0 deletions vllm_omni/entrypoints/openai/protocol/images.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ def validate_response_format(cls, v):
description="True CFG scale (model-specific parameter, may be ignored if not supported)",
)
seed: int | None = Field(default=None, description="Random seed for reproducibility")
generator_device: str | None = Field(
default=None,
description="Device for the seeded torch.Generator (e.g. 'cpu', 'cuda'). Defaults to the runner's device.",
)

# vllm-omni extension for per-request LoRA.
# This mirrors the `extra_body.lora` convention in /v1/chat/completions.
Expand Down
1 change: 1 addition & 0 deletions vllm_omni/inputs/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,7 @@ class OmniDiffusionSamplingParams:
num_outputs_per_prompt: int = 1
seed: int | None = None
generator: torch.Generator | list[torch.Generator] | None = None
generator_device: str | None = None

# layered info
layers: int = 4
Expand Down