Skip to content

DALL-E compatible image generation endpoint#292

Merged
hsliuustc0106 merged 3 commits intovllm-project:mainfrom
dougbtv:dalle-compat-image-api
Dec 23, 2025
Merged

DALL-E compatible image generation endpoint#292
hsliuustc0106 merged 3 commits intovllm-project:mainfrom
dougbtv:dalle-compat-image-api

Conversation

@dougbtv
Copy link
Contributor

@dougbtv dougbtv commented Dec 11, 2025

quick overview.

This introduces a /v1/images/generations OpenAI API endpoint, intended to follow the DALL-E compatible endpoint. This enables serving diffusion models through an OpenAI compatible API.

This is in addition to generating diffusion outputs using the completions API, and following the methodology defined and merged in the diffusion online serving PR #259

cc: @fake0fan (thanks for getting the work off to a great start in 259!)

Example client implementation @ https://github.com/dougbtv/comfyui-vllm-omni/

review tips.

When reviewing, I recommend going by commit, and see the changes broken into:

  • [docs]
  • [testing]
  • [feature]

so you can isolate just the changes / tests / docs during your review.

design thoughts.

This builds directly on the async diffusion serving work introduced in #259 and adds a dedicated diffusion image-generation endpoint, rather than relying solely on the completions API.

The primary goal of this PR is to add the endpoint itself and make it usable end-to-end. While earlier iterations explored a model abstraction layer for enforcing model-specific defaults and constraints, that has been intentionally removed here to keep the scope tight.


overview.

[Feature] Add OpenAI DALL-E compatible image generation API

Builds on @fake0fan's diffusion online serving implementation to provide
a production-ready, OpenAI-compatible image generation API. Implements
the DALL-E /v1/images/generations endpoint with full async support and
proper error handling.

This implementation focuses on generation-only (not editing) to keep
the initial PR manageable while maintaining full functionality and
extensibility.

OpenAI DALL-E API Compatibility:

  • /v1/images/generations - Text-to-image generation
  • Full compatibility with OpenAI Python SDK
  • Request/response formats match DALL-E specification

Unified Async Server:

  • Single vllm serve <model> --omni command for all diffusion models
  • Async AsyncOmniDiffusion engine with thread-pool execution
  • Exposes both /v1/images/generations and /v1/chat/completions

Features:

  • Pydantic validation for all request parameters
  • Explicit API-level errors (HTTP status codes, messages)
  • Model field validation and empty prompt validation
  • Response format validation (b64_json only)

Built on @fake0fan's excellent diffusion online serving work. This PR
adds the DALL-E compatible API layer.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@dougbtv dougbtv force-pushed the dalle-compat-image-api branch 3 times, most recently from 48fee5a to 65ab272 Compare December 11, 2025 22:41
@dougbtv
Copy link
Contributor Author

dougbtv commented Dec 11, 2025

We decided in the maintainer's call, with helpful input from Roger Wang (thank you!) to first start with a single endpoint, for v/1/images/generation -- I'll put together that as a next iteration

@dougbtv dougbtv marked this pull request as draft December 12, 2025 17:46
@dougbtv dougbtv force-pushed the dalle-compat-image-api branch 6 times, most recently from 3cf6521 to c981e03 Compare December 12, 2025 21:25
@dougbtv dougbtv changed the title DALL-E compatible image generation (and editing) endpoints DALL-E compatible image generation endpoint Dec 12, 2025
@dougbtv dougbtv force-pushed the dalle-compat-image-api branch from c981e03 to f540b9e Compare December 12, 2025 21:35
@dougbtv dougbtv marked this pull request as ready for review December 12, 2025 21:37
@dougbtv
Copy link
Contributor Author

dougbtv commented Dec 12, 2025

alright -- I've gone ahead with a refactor on this PR to address comments from Thursday's maintainer's call.

Basically the gist is that I reduced this down to just the /v1/images/generations endpoint and removed the image edit endpoint. There's still a lot to make for the basis of the single endpoint, and there's also a lot of testing and docs.

So I broke it out into three commits, with commit messages like [docs], [tests], [feature] so that it's a little easier to review.

appreciate the input!

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also align with #274

@hsliuustc0106
Copy link
Collaborator

@gcanlin I think this is related to #197, PTAL

@dougbtv dougbtv force-pushed the dalle-compat-image-api branch 2 times, most recently from ffe73eb to a434a65 Compare December 15, 2025 19:46
@dougbtv
Copy link
Contributor Author

dougbtv commented Dec 15, 2025

I've got the branch rebased on main, and I've incorporated the style used in #274 for documentation in my docs update, thanks for letting me know!

@dougbtv dougbtv force-pushed the dalle-compat-image-api branch 3 times, most recently from 925d687 to 02f3c8e Compare December 16, 2025 18:45
@gcanlin
Copy link
Contributor

gcanlin commented Dec 17, 2025

#259 has been merged now. Could you please rebase this PR on the newest main? Thanks!

@dougbtv dougbtv force-pushed the dalle-compat-image-api branch 2 times, most recently from 1c5866b to 834957f Compare December 17, 2025 13:34

The server automatically enables VAE slicing and tiling for memory optimization.

### Invalid Size Format
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obsolete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed section after above fixes.

@dougbtv dougbtv force-pushed the dalle-compat-image-api branch from 5ca1cb5 to 4f4caed Compare December 17, 2025 16:00
docs/.nav.yml Outdated
- Online Serving:
- Qwen2.5-Omni: user_guide/examples/online_serving/qwen2_5_omni.md
- Qwen3-Omni: user_guide/examples/online_serving/qwen3_omni.md
- Image Generation API: user_guide/examples/online_serving/image_generation_api.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we move this somewhere else in docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to take suggestions on where it could be move or improved.

This seemed logical to start, but, potentially with integrations for what will now be two openai-style API endpoints (completions and images/generations) that can produce image output... we could add a OpenAI-compatible API Endpoints section, maybe in the user guide?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dougbtv
Copy link
Contributor Author

dougbtv commented Dec 18, 2025

As discussed maintainers’ call on 12/18, I’m going to scope this PR down to just adding the API endpoint and remove the "diffusion model profile" abstraction for now.

For now, this keeps the PR focused on the endpoint and avoids baking in an abstraction prematurely.

...Separately, I'll think about a follow-up design around model identification / family detection and how (or if) we want to formalize model-specific parameters long-term.

I'll come through with doc updates and a round of refactors tomorrow morning, thanks for the input!

@dougbtv dougbtv force-pushed the dalle-compat-image-api branch 3 times, most recently from 5a788b6 to 0dee152 Compare December 19, 2025 15:59
@dougbtv
Copy link
Contributor Author

dougbtv commented Dec 19, 2025

Alright, the latest push should reflect the latest state as we discussed, so the general idea is now this separates the concern of the model specific validations and defaults, and becomes model agnostic.

So the work is cinched in to be the text to image endpoint, and testing.

Test results from an end-to-end run using vllm serve + curl as well as a test run in gist: https://gist.github.com/dougbtv/7bc9041e2319b094321f3dcdf84f32dc

Much to my merriment, omission of parameters still appears to nominally "work" with the models I tested (z-image turbo and qwen image), unsure of where the defaults are coming from as of yet, but from a UX perspective, it's a good start even without having model-family-specific-parameter validation.

@david6666666 david6666666 added the ready label to trigger buildkite CI label Dec 22, 2025
@david6666666
Copy link
Collaborator

please rebase code to test CI

Add comprehensive documentation for OpenAI DALL-E compatible image generation
API in vLLM-Omni. Documentation covers:

- API endpoint specification and request/response formats
- Quick start examples using curl, Python requests, and OpenAI SDK
- Parameter descriptions with pass-through design
- Multiple example scenarios (multiple images, negative prompts, etc.)
- Error responses and troubleshooting guidance
- Testing and development instructions

The API uses a pass-through design where parameters are forwarded directly
to the diffusion pipeline without model-specific transformation. This keeps
the API simple and focused on the OpenAI-compatible endpoint.

Signed-off-by: dougbtv <[email protected]>
Add full test coverage for the OpenAI-compatible image generation API:

**Utility Tests:**
- Size parsing with valid/invalid formats and edge cases
- Image base64 encoding/decoding

**Integration Tests:**
- Single and multiple image generation
- Custom parameters (negative prompt, seed, steps, guidance scales)
- Size validation and error handling

**Pass-Through Tests:**
- Verify parameters forwarded without modification
- Verify optional parameters omitted when not provided
- Model field validation

**Error Handling:**
- Missing required fields (422)
- Invalid parameters (422)
- Uninitialized engine (503)
- Unsupported response format (422)

All tests use mocked AsyncOmniDiffusion to avoid GPU dependencies.
The pass-through tests verify that the API correctly forwards user
parameters to the diffusion engine without model-specific transformation.

Signed-off-by: dougbtv <[email protected]>
Implement OpenAI-compatible /v1/images/generations endpoint for text-to-image
generation using diffusion models (Qwen-Image, Z-Image-Turbo).

**Key Features:**
- OpenAI DALL-E API compatibility (prompt, model, n, size, response_format)
- vLLM-Omni extensions (num_inference_steps, guidance_scale, true_cfg_scale,
  negative_prompt, seed)
- Pass-through parameter design - forwards user values directly to pipeline
- Automatic server detection and routing for diffusion models
- Base64 PNG response format

**Implementation:**
- api_server.py: POST /v1/images/generations endpoint with pass-through logic
- protocol/images.py: Pydantic models for request/response validation
- image_api_utils.py: Utility functions (parse_size, encode_image_base64)
- Integration with AsyncOmniDiffusion for async image generation

**Design Philosophy:**
The API uses a pass-through design where parameters are forwarded directly
to the diffusion engine without model-specific transformation. When optional
parameters are not provided, they are omitted entirely, allowing the underlying
model to use its own defaults. This keeps the API simple and focused on the
OpenAI-compatible endpoint without building model-specific abstractions.

Model mismatch warnings are logged but do not block requests (the server's
loaded model is always used).

Signed-off-by: dougbtv <[email protected]>
@dougbtv dougbtv force-pushed the dalle-compat-image-api branch from 0dee152 to a0d8f83 Compare December 22, 2025 14:04
@dougbtv
Copy link
Contributor Author

dougbtv commented Dec 22, 2025

Thanks! Rebased my branch and monitoring test results now, appreciate it.

@dougbtv
Copy link
Contributor Author

dougbtv commented Dec 22, 2025

Passing existing tests, and new tests passing as well in: https://buildkite.com/vllm/vllm-omni/builds/1112#019b4661-385f-4d1a-9a59-5c82c507b160/L278

Copy link
Contributor

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 merged commit 74c1424 into vllm-project:main Dec 23, 2025
7 checks passed
ZeldaHuang pushed a commit to ZeldaHuang/vllm-omni that referenced this pull request Dec 25, 2025
yenuo26 pushed a commit to yenuo26/vllm-omni that referenced this pull request Dec 29, 2025
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants