[FEATURE] /v1/images/edit interface by Bounty-hunter · Pull Request #1101 · vllm-project/vllm-omni

Bounty-hunter · 2026-01-30T04:26:49Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

As describe in #1070

(1) Add multipart interface : /v1/images/edits

(2) extract common function for both edit and generate: _get_engine_and_model _parse_lora_request _generate_with_async_omni _update_if_not_none _extract_images_from_result _choose_output_format

Test Plan

pytest

================================================================================= 31 passed, 3 warnings in 34.19s ================================================================================

end2end test

start with:

vllm serve Qwen/Qwen-Image-Edit-2511 --omni --port 8299 --default-sampling-params '{"0": {"num_inference_steps": 4, "guidance_scale": 7.5}}' --max-generated-image-size 4194304

qwem-bear.png

testing:

curl -s -D >(grep -i x-request-id >&2) \
  -o >(jq -r '.data[0].b64_json' | base64 --decode > walking_4step.png) \
  -X POST "http://localhost:8299/v1/images/edits" \
  -F "model=Qwen/Qwen-Image-Edit-2511" \
  -F "image=@./qwen-bear.png" \
  -F "image=@./qwen-bear.png" \
  -F "prompt='Change the bears in the two input images into walking togather.'" \
  -F "size=1024x1024" \
  -F "output_format=png" \
  -F "negative_prompt=''" \
  -F "cfg_scale=4.0" \
  -F "seed=0"

curl -s -D >(grep -i x-request-id >&2) \
  -o >(jq -r '.data[0].b64_json' | base64 --decode > walking_50step.png) \
  -X POST "http://localhost:8299/v1/images/edits" \
  -F "model=Qwen/Qwen-Image-Edit-2511" \
  -F "image=@./qwen-bear.png" \
  -F "image=@./qwen-bear.png" \
  -F "prompt='Change the bears in the two input images into walking togather.'" \
  -F "size=1024x1024" \
  -F "output_format=png" \
  -F "negative_prompt=''" \
  -F "cfg_scale=4.0" \
  -F "num_inference_steps=50" \
  -F "seed=0"

import base64
from openai import OpenAI
import os
client = OpenAI(
    api_key="None",
    base_url="http://localhost:8299/v1"
)

result = client.images.edit(
    model="Qwen/Qwen-Image-Edit-2511",
    image=[
        open("./qwen-bear.png", "rb"),
    ],
    prompt="Change the bear in the input image to sitting and reading a book. Keep the bear recognizable from the original image. Make the scene cozy and natural, with soft lighting, warm colors, and a harmonious background.",
    size='1024x1024',
    stream=False,
    output_format='jpeg',
    extra_body={
        "num_inference_steps": 50,
        "guidance_scale": 1.0,

    }
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

# Save the image to a file
with open("qwen_bear_reading.jpeg", "wb") as f:
    f.write(image_bytes)

import base64
from openai import OpenAI
from pathlib import Path
client = OpenAI(
    api_key="None",
    base_url="http://localhost:8299/v1"
)

input_image_url1 = "https://vllm-public-assets.s3.us-west-2.amazonaws.com/omni-assets/qwen-bear.png"
input_image_url2 = "https://vllm-public-assets.s3.us-west-2.amazonaws.com/omni-assets/qwen-bear.png"

def _encode_image_as_data_url(input_path: Path) -> str:
    image_bytes = input_path.read_bytes()
    try:
        img = Image.open(BytesIO(image_bytes))
        mime_type = f"image/{img.format.lower()}" if img.format else "image/png"
    except Exception:
        mime_type = "image/png"
    image_b64 = base64.b64encode(image_bytes).decode("utf-8")
    return f"data:{mime_type};base64,{image_b64}"


url = _encode_image_as_data_url(Path("./qwen-bear.png"))
result = client.images.edit(
    image=[],
    model="Qwen-Image-Edit-2511",
    prompt="Change the bears in the three input images into sitting together and eating a meal.",
    size='1024x1024',
    stream=False,
    output_format='jpeg',
    # url格式
    extra_body={
        "url": [input_image_url1, input_image_url1, url],
        "num_inference_steps": 50,
        "guidance_scale": 1.0,
        "negative_prompt": "",
        "seed": 0,
    }
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

# Save the image to a file
with open("edit_out_http.jpeg", "wb") as f:
    f.write(image_bytes)


 curl -X POST "http://localhost:8299/v1/images/edits" \
  -F "model=/home/d00806799/Qwen-Image-Edit-2511" \
  -F "image=@./image_edit.png" \
  -F "image=@./qwen-bear.png" \
  -F "prompt='bear from image1 and image2 fight.'" \
  -F "size=4096x4096" \
  -F "output_format=png" \
  -F "output_compression=100"

{"error":{"message":"Requested image size 4096x4096 exceeds the maximum allowed size of 4194304.0 pixels.","type":"Bad Request","param":null,"code":400}}

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 789b81cb8a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".