DALL-E compatible image generation endpoint by dougbtv · Pull Request #292 · vllm-project/vllm-omni

dougbtv · 2025-12-11T17:54:25Z

quick overview.

This introduces a /v1/images/generations OpenAI API endpoint, intended to follow the DALL-E compatible endpoint. This enables serving diffusion models through an OpenAI compatible API.

This is in addition to generating diffusion outputs using the completions API, and following the methodology defined and merged in the diffusion online serving PR #259

cc: @fake0fan (thanks for getting the work off to a great start in 259!)

Example client implementation @ https://github.com/dougbtv/comfyui-vllm-omni/

review tips.

When reviewing, I recommend going by commit, and see the changes broken into:

[docs]
[testing]
[feature]

so you can isolate just the changes / tests / docs during your review.

design thoughts.

This builds directly on the async diffusion serving work introduced in #259 and adds a dedicated diffusion image-generation endpoint, rather than relying solely on the completions API.

The primary goal of this PR is to add the endpoint itself and make it usable end-to-end. While earlier iterations explored a model abstraction layer for enforcing model-specific defaults and constraints, that has been intentionally removed here to keep the scope tight.

overview.

[Feature] Add OpenAI DALL-E compatible image generation API

Builds on @fake0fan's diffusion online serving implementation to provide
a production-ready, OpenAI-compatible image generation API. Implements
the DALL-E /v1/images/generations endpoint with full async support and
proper error handling.

This implementation focuses on generation-only (not editing) to keep
the initial PR manageable while maintaining full functionality and
extensibility.

OpenAI DALL-E API Compatibility:

/v1/images/generations - Text-to-image generation
Full compatibility with OpenAI Python SDK
Request/response formats match DALL-E specification

Unified Async Server:

Single vllm serve <model> --omni command for all diffusion models
Async AsyncOmniDiffusion engine with thread-pool execution
Exposes both /v1/images/generations and /v1/chat/completions

Features:

Pydantic validation for all request parameters
Explicit API-level errors (HTTP status codes, messages)
Model field validation and empty prompt validation
Response format validation (b64_json only)

Built on @fake0fan's excellent diffusion online serving work. This PR
adds the DALL-E compatible API layer.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/entrypoints/async_diffusion.py

dougbtv · 2025-12-11T23:12:50Z

We decided in the maintainer's call, with helpful input from Roger Wang (thank you!) to first start with a single endpoint, for v/1/images/generation -- I'll put together that as a next iteration

dougbtv · 2025-12-12T21:42:22Z

alright -- I've gone ahead with a refactor on this PR to address comments from Thursday's maintainer's call.

Basically the gist is that I reduced this down to just the /v1/images/generations endpoint and removed the image edit endpoint. There's still a lot to make for the basis of the single endpoint, and there's also a lot of testing and docs.

So I broke it out into three commits, with commit messages like [docs], [tests], [feature] so that it's a little easier to review.

appreciate the input!

hsliuustc0106

please also align with #274

hsliuustc0106 · 2025-12-13T23:28:34Z

@gcanlin I think this is related to #197, PTAL

dougbtv · 2025-12-15T19:48:54Z

I've got the branch rebased on main, and I've incorporated the style used in #274 for documentation in my docs update, thanks for letting me know!

gcanlin · 2025-12-17T12:16:02Z

#259 has been merged now. Could you please rebase this PR on the newest main? Thanks!

dougbtv · 2025-12-17T15:58:37Z

docs/user_guide/examples/online_serving/image_generation_api.md

+
+The server automatically enables VAE slicing and tiling for memory optimization.
+
+### Invalid Size Format


This is obsolete

removed section after above fixes.

docs/serving/image_generation_api.md

hsliuustc0106 · 2025-12-18T15:02:58Z

docs/.nav.yml

    - Online Serving:
      - Qwen2.5-Omni: user_guide/examples/online_serving/qwen2_5_omni.md
      - Qwen3-Omni: user_guide/examples/online_serving/qwen3_omni.md
+      - Image Generation API: user_guide/examples/online_serving/image_generation_api.md


shall we move this somewhere else in docs

Happy to take suggestions on where it could be move or improved.

This seemed logical to start, but, potentially with integrations for what will now be two openai-style API endpoints (completions and images/generations) that can produce image output... we could add a OpenAI-compatible API Endpoints section, maybe in the user guide?

yes, we can check https://docs.vllm.ai/en/latest/serving/openai_compatible_server/ as a reference.

dougbtv · 2025-12-18T23:48:49Z

As discussed maintainers’ call on 12/18, I’m going to scope this PR down to just adding the API endpoint and remove the "diffusion model profile" abstraction for now.

For now, this keeps the PR focused on the endpoint and avoids baking in an abstraction prematurely.

...Separately, I'll think about a follow-up design around model identification / family detection and how (or if) we want to formalize model-specific parameters long-term.

I'll come through with doc updates and a round of refactors tomorrow morning, thanks for the input!

dougbtv · 2025-12-19T16:14:40Z

Alright, the latest push should reflect the latest state as we discussed, so the general idea is now this separates the concern of the model specific validations and defaults, and becomes model agnostic.

So the work is cinched in to be the text to image endpoint, and testing.

Test results from an end-to-end run using vllm serve + curl as well as a test run in gist: https://gist.github.com/dougbtv/7bc9041e2319b094321f3dcdf84f32dc

Much to my merriment, omission of parameters still appears to nominally "work" with the models I tested (z-image turbo and qwen image), unsure of where the defaults are coming from as of yet, but from a UX perspective, it's a good start even without having model-family-specific-parameter validation.

david6666666 · 2025-12-22T02:28:12Z

please rebase code to test CI

Add comprehensive documentation for OpenAI DALL-E compatible image generation API in vLLM-Omni. Documentation covers: - API endpoint specification and request/response formats - Quick start examples using curl, Python requests, and OpenAI SDK - Parameter descriptions with pass-through design - Multiple example scenarios (multiple images, negative prompts, etc.) - Error responses and troubleshooting guidance - Testing and development instructions The API uses a pass-through design where parameters are forwarded directly to the diffusion pipeline without model-specific transformation. This keeps the API simple and focused on the OpenAI-compatible endpoint. Signed-off-by: dougbtv <[email protected]>

Add full test coverage for the OpenAI-compatible image generation API: **Utility Tests:** - Size parsing with valid/invalid formats and edge cases - Image base64 encoding/decoding **Integration Tests:** - Single and multiple image generation - Custom parameters (negative prompt, seed, steps, guidance scales) - Size validation and error handling **Pass-Through Tests:** - Verify parameters forwarded without modification - Verify optional parameters omitted when not provided - Model field validation **Error Handling:** - Missing required fields (422) - Invalid parameters (422) - Uninitialized engine (503) - Unsupported response format (422) All tests use mocked AsyncOmniDiffusion to avoid GPU dependencies. The pass-through tests verify that the API correctly forwards user parameters to the diffusion engine without model-specific transformation. Signed-off-by: dougbtv <[email protected]>

Implement OpenAI-compatible /v1/images/generations endpoint for text-to-image generation using diffusion models (Qwen-Image, Z-Image-Turbo). **Key Features:** - OpenAI DALL-E API compatibility (prompt, model, n, size, response_format) - vLLM-Omni extensions (num_inference_steps, guidance_scale, true_cfg_scale, negative_prompt, seed) - Pass-through parameter design - forwards user values directly to pipeline - Automatic server detection and routing for diffusion models - Base64 PNG response format **Implementation:** - api_server.py: POST /v1/images/generations endpoint with pass-through logic - protocol/images.py: Pydantic models for request/response validation - image_api_utils.py: Utility functions (parse_size, encode_image_base64) - Integration with AsyncOmniDiffusion for async image generation **Design Philosophy:** The API uses a pass-through design where parameters are forwarded directly to the diffusion engine without model-specific transformation. When optional parameters are not provided, they are omitted entirely, allowing the underlying model to use its own defaults. This keeps the API simple and focused on the OpenAI-compatible endpoint without building model-specific abstractions. Model mismatch warnings are logged but do not block requests (the server's loaded model is always used). Signed-off-by: dougbtv <[email protected]>

dougbtv · 2025-12-22T14:05:09Z

Thanks! Rebased my branch and monitoring test results now, appreciate it.

dougbtv · 2025-12-22T14:38:27Z

Passing existing tests, and new tests passing as well in: https://buildkite.com/vllm/vllm-omni/builds/1112#019b4661-385f-4d1a-9a59-5c82c507b160/L278

gcanlin

LGTM, thanks!

hsliuustc0106

lgtm

Signed-off-by: dougbtv <[email protected]>

Signed-off-by: dougbtv <[email protected]> Signed-off-by: wangyu31577 <[email protected]>

Signed-off-by: dougbtv <[email protected]>

dougbtv requested a review from hsliuustc0106 as a code owner December 11, 2025 17:54

dougbtv mentioned this pull request Dec 11, 2025

[Entrypoints] Support Online Serving for Diffusion-only Models #259

Merged

chatgpt-codex-connector bot reviewed Dec 11, 2025

View reviewed changes

vllm_omni/entrypoints/async_diffusion.py Show resolved Hide resolved

dougbtv force-pushed the dalle-compat-image-api branch 3 times, most recently from 48fee5a to 65ab272 Compare December 11, 2025 22:41

gcanlin mentioned this pull request Dec 12, 2025

[Feature]: API - OpenAI API for image generation #197

Closed

1 task

dougbtv marked this pull request as draft December 12, 2025 17:46

dougbtv force-pushed the dalle-compat-image-api branch 6 times, most recently from 3cf6521 to c981e03 Compare December 12, 2025 21:25

dougbtv changed the title ~~DALL-E compatible image generation (and editing) endpoints~~ DALL-E compatible image generation endpoint Dec 12, 2025

dougbtv force-pushed the dalle-compat-image-api branch from c981e03 to f540b9e Compare December 12, 2025 21:35

dougbtv marked this pull request as ready for review December 12, 2025 21:37

hsliuustc0106 reviewed Dec 13, 2025

View reviewed changes

dougbtv force-pushed the dalle-compat-image-api branch 2 times, most recently from ffe73eb to a434a65 Compare December 15, 2025 19:46

dougbtv force-pushed the dalle-compat-image-api branch 3 times, most recently from 925d687 to 02f3c8e Compare December 16, 2025 18:45

dougbtv force-pushed the dalle-compat-image-api branch 2 times, most recently from 1c5866b to 834957f Compare December 17, 2025 13:34

dougbtv commented Dec 17, 2025

View reviewed changes

dougbtv force-pushed the dalle-compat-image-api branch from 5ca1cb5 to 4f4caed Compare December 17, 2025 16:00

hsliuustc0106 reviewed Dec 18, 2025

View reviewed changes

docs/serving/image_generation_api.md Show resolved Hide resolved

hsliuustc0106 mentioned this pull request Dec 18, 2025

[Roadmap]: preparing for v0.12.0 release #165

Closed

61 tasks

david6666666 mentioned this pull request Dec 18, 2025

[Entrypoints] online serving support usp args #366

Merged

5 tasks

hsliuustc0106 reviewed Dec 18, 2025

View reviewed changes

dougbtv force-pushed the dalle-compat-image-api branch 3 times, most recently from 5a788b6 to 0dee152 Compare December 19, 2025 15:59

david6666666 added the ready label to trigger buildkite CI label Dec 22, 2025

dougbtv added 3 commits December 22, 2025 09:04

dougbtv force-pushed the dalle-compat-image-api branch from 0dee152 to a0d8f83 Compare December 22, 2025 14:04

gcanlin approved these changes Dec 23, 2025

View reviewed changes

hsliuustc0106 approved these changes Dec 23, 2025

View reviewed changes

hsliuustc0106 merged commit 74c1424 into vllm-project:main Dec 23, 2025
7 checks passed

ZeldaHuang pushed a commit to ZeldaHuang/vllm-omni that referenced this pull request Dec 25, 2025

DALL-E compatible image generation endpoint (vllm-project#292)

2f94f83

Signed-off-by: dougbtv <[email protected]>

This was referenced Dec 25, 2025

[Feature]: ComfyUI Intergration #475

Closed

[RFC]: text-to-image online serving does not support OpenAI-compatible formats #480

Closed

add openai create speech endpoint #305

Merged

hsliuustc0106 mentioned this pull request Dec 27, 2025

[RFC]: DiT model and feature support enhancement #85

Closed

58 tasks

yenuo26 pushed a commit to yenuo26/vllm-omni that referenced this pull request Dec 29, 2025

DALL-E compatible image generation endpoint (vllm-project#292)

92c743f

Signed-off-by: dougbtv <[email protected]> Signed-off-by: wangyu31577 <[email protected]>

princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026

DALL-E compatible image generation endpoint (vllm-project#292)

02692ab

Signed-off-by: dougbtv <[email protected]>

hsliuustc0106 mentioned this pull request Jan 23, 2026

[Feature]: online serving for image edit and video generation #918

Open

1 task


		The server automatically enables VAE slicing and tiling for memory optimization.

		### Invalid Size Format

Conversation

dougbtv commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

quick overview.

review tips.

design thoughts.

overview.

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

dougbtv commented Dec 11, 2025

Uh oh!

dougbtv commented Dec 12, 2025

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Dec 13, 2025

Uh oh!

dougbtv commented Dec 15, 2025

Uh oh!

gcanlin commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dougbtv Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

dougbtv Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hsliuustc0106 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

dougbtv Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

dougbtv commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dougbtv commented Dec 19, 2025

Uh oh!

david6666666 commented Dec 22, 2025

Uh oh!

dougbtv commented Dec 22, 2025

Uh oh!

dougbtv commented Dec 22, 2025

Uh oh!

gcanlin left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dougbtv commented Dec 11, 2025 •

edited

Loading

gcanlin commented Dec 17, 2025 •

edited

Loading

dougbtv commented Dec 18, 2025 •

edited

Loading