DALL-E compatible image generation endpoint#292
DALL-E compatible image generation endpoint#292hsliuustc0106 merged 3 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
48fee5a to
65ab272
Compare
|
We decided in the maintainer's call, with helpful input from Roger Wang (thank you!) to first start with a single endpoint, for v/1/images/generation -- I'll put together that as a next iteration |
3cf6521 to
c981e03
Compare
c981e03 to
f540b9e
Compare
|
alright -- I've gone ahead with a refactor on this PR to address comments from Thursday's maintainer's call. Basically the gist is that I reduced this down to just the /v1/images/generations endpoint and removed the image edit endpoint. There's still a lot to make for the basis of the single endpoint, and there's also a lot of testing and docs. So I broke it out into three commits, with commit messages like [docs], [tests], [feature] so that it's a little easier to review. appreciate the input! |
hsliuustc0106
left a comment
There was a problem hiding this comment.
please also align with #274
ffe73eb to
a434a65
Compare
|
I've got the branch rebased on main, and I've incorporated the style used in #274 for documentation in my docs update, thanks for letting me know! |
925d687 to
02f3c8e
Compare
|
#259 has been merged now. Could you please rebase this PR on the newest main? Thanks! |
1c5866b to
834957f
Compare
|
|
||
| The server automatically enables VAE slicing and tiling for memory optimization. | ||
|
|
||
| ### Invalid Size Format |
There was a problem hiding this comment.
removed section after above fixes.
5ca1cb5 to
4f4caed
Compare
docs/.nav.yml
Outdated
| - Online Serving: | ||
| - Qwen2.5-Omni: user_guide/examples/online_serving/qwen2_5_omni.md | ||
| - Qwen3-Omni: user_guide/examples/online_serving/qwen3_omni.md | ||
| - Image Generation API: user_guide/examples/online_serving/image_generation_api.md |
There was a problem hiding this comment.
shall we move this somewhere else in docs
There was a problem hiding this comment.
Happy to take suggestions on where it could be move or improved.
This seemed logical to start, but, potentially with integrations for what will now be two openai-style API endpoints (completions and images/generations) that can produce image output... we could add a OpenAI-compatible API Endpoints section, maybe in the user guide?
There was a problem hiding this comment.
yes, we can check https://docs.vllm.ai/en/latest/serving/openai_compatible_server/ as a reference.
|
As discussed maintainers’ call on 12/18, I’m going to scope this PR down to just adding the API endpoint and remove the "diffusion model profile" abstraction for now. For now, this keeps the PR focused on the endpoint and avoids baking in an abstraction prematurely. ...Separately, I'll think about a follow-up design around model identification / family detection and how (or if) we want to formalize model-specific parameters long-term. I'll come through with doc updates and a round of refactors tomorrow morning, thanks for the input! |
5a788b6 to
0dee152
Compare
|
Alright, the latest push should reflect the latest state as we discussed, so the general idea is now this separates the concern of the model specific validations and defaults, and becomes model agnostic. So the work is cinched in to be the text to image endpoint, and testing. Test results from an end-to-end run using vllm serve + curl as well as a test run in gist: https://gist.github.com/dougbtv/7bc9041e2319b094321f3dcdf84f32dc Much to my merriment, omission of parameters still appears to nominally "work" with the models I tested (z-image turbo and qwen image), unsure of where the defaults are coming from as of yet, but from a UX perspective, it's a good start even without having model-family-specific-parameter validation. |
|
please rebase code to test CI |
Add comprehensive documentation for OpenAI DALL-E compatible image generation API in vLLM-Omni. Documentation covers: - API endpoint specification and request/response formats - Quick start examples using curl, Python requests, and OpenAI SDK - Parameter descriptions with pass-through design - Multiple example scenarios (multiple images, negative prompts, etc.) - Error responses and troubleshooting guidance - Testing and development instructions The API uses a pass-through design where parameters are forwarded directly to the diffusion pipeline without model-specific transformation. This keeps the API simple and focused on the OpenAI-compatible endpoint. Signed-off-by: dougbtv <[email protected]>
Add full test coverage for the OpenAI-compatible image generation API: **Utility Tests:** - Size parsing with valid/invalid formats and edge cases - Image base64 encoding/decoding **Integration Tests:** - Single and multiple image generation - Custom parameters (negative prompt, seed, steps, guidance scales) - Size validation and error handling **Pass-Through Tests:** - Verify parameters forwarded without modification - Verify optional parameters omitted when not provided - Model field validation **Error Handling:** - Missing required fields (422) - Invalid parameters (422) - Uninitialized engine (503) - Unsupported response format (422) All tests use mocked AsyncOmniDiffusion to avoid GPU dependencies. The pass-through tests verify that the API correctly forwards user parameters to the diffusion engine without model-specific transformation. Signed-off-by: dougbtv <[email protected]>
Implement OpenAI-compatible /v1/images/generations endpoint for text-to-image generation using diffusion models (Qwen-Image, Z-Image-Turbo). **Key Features:** - OpenAI DALL-E API compatibility (prompt, model, n, size, response_format) - vLLM-Omni extensions (num_inference_steps, guidance_scale, true_cfg_scale, negative_prompt, seed) - Pass-through parameter design - forwards user values directly to pipeline - Automatic server detection and routing for diffusion models - Base64 PNG response format **Implementation:** - api_server.py: POST /v1/images/generations endpoint with pass-through logic - protocol/images.py: Pydantic models for request/response validation - image_api_utils.py: Utility functions (parse_size, encode_image_base64) - Integration with AsyncOmniDiffusion for async image generation **Design Philosophy:** The API uses a pass-through design where parameters are forwarded directly to the diffusion engine without model-specific transformation. When optional parameters are not provided, they are omitted entirely, allowing the underlying model to use its own defaults. This keeps the API simple and focused on the OpenAI-compatible endpoint without building model-specific abstractions. Model mismatch warnings are logged but do not block requests (the server's loaded model is always used). Signed-off-by: dougbtv <[email protected]>
0dee152 to
a0d8f83
Compare
|
Thanks! Rebased my branch and monitoring test results now, appreciate it. |
|
Passing existing tests, and new tests passing as well in: https://buildkite.com/vllm/vllm-omni/builds/1112#019b4661-385f-4d1a-9a59-5c82c507b160/L278 |
Signed-off-by: dougbtv <[email protected]>
Signed-off-by: dougbtv <[email protected]> Signed-off-by: wangyu31577 <[email protected]>
Signed-off-by: dougbtv <[email protected]>
quick overview.
This introduces a
/v1/images/generationsOpenAI API endpoint, intended to follow the DALL-E compatible endpoint. This enables serving diffusion models through an OpenAI compatible API.This is in addition to generating diffusion outputs using the completions API, and following the methodology defined and merged in the diffusion online serving PR #259
cc: @fake0fan (thanks for getting the work off to a great start in 259!)
Example client implementation @ https://github.com/dougbtv/comfyui-vllm-omni/
review tips.
When reviewing, I recommend going by commit, and see the changes broken into:
[docs][testing][feature]so you can isolate just the changes / tests / docs during your review.
design thoughts.
This builds directly on the async diffusion serving work introduced in #259 and adds a dedicated diffusion image-generation endpoint, rather than relying solely on the completions API.
The primary goal of this PR is to add the endpoint itself and make it usable end-to-end. While earlier iterations explored a model abstraction layer for enforcing model-specific defaults and constraints, that has been intentionally removed here to keep the scope tight.
overview.
[Feature] Add OpenAI DALL-E compatible image generation API
Builds on @fake0fan's diffusion online serving implementation to provide
a production-ready, OpenAI-compatible image generation API. Implements
the DALL-E /v1/images/generations endpoint with full async support and
proper error handling.
This implementation focuses on generation-only (not editing) to keep
the initial PR manageable while maintaining full functionality and
extensibility.
OpenAI DALL-E API Compatibility:
Unified Async Server:
vllm serve <model> --omnicommand for all diffusion modelsFeatures:
Built on @fake0fan's excellent diffusion online serving work. This PR
adds the DALL-E compatible API layer.