Skip to content

[FEATURE] /v1/images/edit interface#1101

Merged
hsliuustc0106 merged 1 commit intovllm-project:mainfrom
Bounty-hunter:api_image_edit
Jan 31, 2026
Merged

[FEATURE] /v1/images/edit interface#1101
hsliuustc0106 merged 1 commit intovllm-project:mainfrom
Bounty-hunter:api_image_edit

Conversation

@Bounty-hunter
Copy link
Contributor

@Bounty-hunter Bounty-hunter commented Jan 30, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

As describe in #1070

(1) Add multipart interface : /v1/images/edits

(2) extract common function for both edit and generate: _get_engine_and_model _parse_lora_request _generate_with_async_omni _update_if_not_none _extract_images_from_result _choose_output_format

Test Plan

pytest

================================================================================= 31 passed, 3 warnings in 34.19s ================================================================================

end2end test

start with:

vllm serve Qwen/Qwen-Image-Edit-2511 --omni --port 8299 --default-sampling-params '{"0": {"num_inference_steps": 4, "guidance_scale": 7.5}}' --max-generated-image-size 4194304

qwem-bear.png
image002

testing:

curl -s -D >(grep -i x-request-id >&2) \
  -o >(jq -r '.data[0].b64_json' | base64 --decode > walking_4step.png) \
  -X POST "http://localhost:8299/v1/images/edits" \
  -F "model=Qwen/Qwen-Image-Edit-2511" \
  -F "image=@./qwen-bear.png" \
  -F "image=@./qwen-bear.png" \
  -F "prompt='Change the bears in the two input images into walking togather.'" \
  -F "size=1024x1024" \
  -F "output_format=png" \
  -F "negative_prompt=''" \
  -F "cfg_scale=4.0" \
  -F "seed=0"
image003
curl -s -D >(grep -i x-request-id >&2) \
  -o >(jq -r '.data[0].b64_json' | base64 --decode > walking_50step.png) \
  -X POST "http://localhost:8299/v1/images/edits" \
  -F "model=Qwen/Qwen-Image-Edit-2511" \
  -F "image=@./qwen-bear.png" \
  -F "image=@./qwen-bear.png" \
  -F "prompt='Change the bears in the two input images into walking togather.'" \
  -F "size=1024x1024" \
  -F "output_format=png" \
  -F "negative_prompt=''" \
  -F "cfg_scale=4.0" \
  -F "num_inference_steps=50" \
  -F "seed=0"
image005
import base64
from openai import OpenAI
import os
client = OpenAI(
    api_key="None",
    base_url="http://localhost:8299/v1"
)

result = client.images.edit(
    model="Qwen/Qwen-Image-Edit-2511",
    image=[
        open("./qwen-bear.png", "rb"),
    ],
    prompt="Change the bear in the input image to sitting and reading a book. Keep the bear recognizable from the original image. Make the scene cozy and natural, with soft lighting, warm colors, and a harmonious background.",
    size='1024x1024',
    stream=False,
    output_format='jpeg',
    extra_body={
        "num_inference_steps": 50,
        "guidance_scale": 1.0,

    }
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

# Save the image to a file
with open("qwen_bear_reading.jpeg", "wb") as f:
    f.write(image_bytes)

image004

import base64
from openai import OpenAI
from pathlib import Path
client = OpenAI(
    api_key="None",
    base_url="http://localhost:8299/v1"
)

input_image_url1 = "https://vllm-public-assets.s3.us-west-2.amazonaws.com/omni-assets/qwen-bear.png"
input_image_url2 = "https://vllm-public-assets.s3.us-west-2.amazonaws.com/omni-assets/qwen-bear.png"

def _encode_image_as_data_url(input_path: Path) -> str:
    image_bytes = input_path.read_bytes()
    try:
        img = Image.open(BytesIO(image_bytes))
        mime_type = f"image/{img.format.lower()}" if img.format else "image/png"
    except Exception:
        mime_type = "image/png"
    image_b64 = base64.b64encode(image_bytes).decode("utf-8")
    return f"data:{mime_type};base64,{image_b64}"


url = _encode_image_as_data_url(Path("./qwen-bear.png"))
result = client.images.edit(
    image=[],
    model="Qwen-Image-Edit-2511",
    prompt="Change the bears in the three input images into sitting together and eating a meal.",
    size='1024x1024',
    stream=False,
    output_format='jpeg',
    # url格式
    extra_body={
        "url": [input_image_url1, input_image_url1, url],
        "num_inference_steps": 50,
        "guidance_scale": 1.0,
        "negative_prompt": "",
        "seed": 0,
    }
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

# Save the image to a file
with open("edit_out_http.jpeg", "wb") as f:
    f.write(image_bytes)

image001 (1)


 curl -X POST "http://localhost:8299/v1/images/edits" \
  -F "model=/home/d00806799/Qwen-Image-Edit-2511" \
  -F "image=@./image_edit.png" \
  -F "image=@./qwen-bear.png" \
  -F "prompt='bear from image1 and image2 fight.'" \
  -F "size=4096x4096" \
  -F "output_format=png" \
  -F "output_compression=100"

{"error":{"message":"Requested image size 4096x4096 exceeds the maximum allowed size of 4194304.0 pixels.","type":"Bad Request","param":null,"code":400}}

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 789b81cb8a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

# a proper generator is initialized in the backend.
# This fixes issues where using the default global generator
# might produce blurry images in some environments.
gen_params.seed = random.randint(0, 2**32 - 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove random seed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually not remove, we change to

_update_if_not_none(gen_params, "seed", random.randint(0, 2**32 - 1) if seed is None else seed)

# 3.2 Parse and add size if provided
width, height = None, None
if size:
width, height = parse_size(size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should detect image' size if size is illegal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acturally, the "size is illegal" detect is in parse_size: vllm_omni\entrypoints\openai\image_api_utils.py:parse_size, and will raise error if size is illegal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acturally, the "size is illegal" detect is in parse_size: vllm_omni\entrypoints\openai\image_api_utils.py:parse_size, and will raise error if size is illegal

I hope that when the size is equal to "auto", instead of directly throwing an error, we can directly use the size of the first image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

_update_if_not_none(gen_params, "height", height)

# 3.3 Add optional parameters ONLY if provided
_update_if_not_none(gen_params, "num_inference_steps", num_inference_steps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we specify default sampling parameters (such as num_inference_steps, guidance_scale, and true_cfg_scale) when starting the server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we add --default-sampling-params --max-generated-image-size to init the system level default sample param and limit.

@david6666666 david6666666 added this to the v0.14.0 milestone Jan 30, 2026
@Bounty-hunter Bounty-hunter force-pushed the api_image_edit branch 4 times, most recently from ed15ecf to 3879890 Compare January 30, 2026 14:13
@Bounty-hunter Bounty-hunter changed the title [WIP]images/edit interface [Feature] /v1/images/edit interface Jan 30, 2026
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Jan 30, 2026
@hsliuustc0106 hsliuustc0106 requested a review from ZJY0516 January 30, 2026 15:10
@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 30, 2026

@Bounty-hunter Could you also update related docs? And it will be great if you can test qwen image layerd

@hsliuustc0106
Copy link
Collaborator

please use benchmark/diffusion to run the long-time test

@hsliuustc0106
Copy link
Collaborator

add acc result with image output

omni_config_group.add_argument(
"--default-sampling-params",
type=str,
help="Json str for Default sampling parameters, \n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why we need to add these? And I think read from a config file is more user friendly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

targeting for providing cli serve for default sampling to overwrite the default sampling params

@hsliuustc0106
Copy link
Collaborator

fix ci please

@Bounty-hunter Bounty-hunter force-pushed the api_image_edit branch 3 times, most recently from 7f006d7 to 618510c Compare January 31, 2026 03:46
@Bounty-hunter Bounty-hunter changed the title [Feature] /v1/images/edit interface [WIP] /v1/images/edit interface Jan 31, 2026
Signed-off-by: dengyunyang <584797741@qq.com>
@Bounty-hunter Bounty-hunter changed the title [WIP] /v1/images/edit interface [FEATURE] /v1/images/edit interface Jan 31, 2026
)
# Diffusion model mixed precision
omni_config_group.add_argument(
"--max-generated-image-size",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we only check this in image edit, what about image generation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two param only used for image edit now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you extend it in a follow-up PR?

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 31, 2026

please also update related test: test_image_gen_edit.py

@Bounty-hunter
Copy link
Contributor Author

please also update related test: test_image_gen_edit.py

I plan to update following in follow-up PR:
(1) test_image_gen_edit.py
(2) benchmark/diffusion adapter to /v1/images/edit
(3) new params for /v1/images/generations interface.

@hsliuustc0106 hsliuustc0106 merged commit 70a5de9 into vllm-project:main Jan 31, 2026
7 checks passed
@gcanlin
Copy link
Contributor

gcanlin commented Jan 31, 2026

Test Qwen-Image-Edit official example on NPU. The accuracy looks good:

 vllm serve Qwen/Qwen-Image-Edit-2511 --omni --port 8299 --default-sampling-params '{"0": {"num_inference_steps": 4, "guidance_scale": 7.5}}' --max-generated-image-size 4194304 --enforce_eager --vae-use-slicing --vae-use-tiling
import base64
from openai import OpenAI
import os
client = OpenAI(
    api_key="None",
    base_url="http://localhost:8299/v1"
)

result = client.images.edit(
    model="Qwen/Qwen-Image-Edit-2511",
    image=[
        open("./cp-1.png", "rb"),
        open("./cp-2.png", "rb"),
    ],
    prompt="根据这图1中女性和图2中的男性,生成一组结婚照,并遵循以下描述:新郎穿着红色的中式马褂,新娘穿着精致的秀禾服,头戴金色凤冠。他们并肩站立在古老的朱红色宫墙前,背景是雕花的木窗。光线明亮柔和,构图对称,氛围喜庆而庄重。",
    size='1024x1024',
    stream=False,
    output_format='jpeg',
    extra_body={
        "num_inference_steps": 40,
        "guidance_scale": 0,
        "negative_prompt": " "
    }
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

# Save the image to a file
with open("cp.jpeg", "wb") as f:
    f.write(image_bytes)
image image

dongbo910220 pushed a commit to dongbo910220/vllm-omni that referenced this pull request Feb 1, 2026
Signed-off-by: dengyunyang <584797741@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants