Skip to content

Add Qwen3-VL-Embedding-2B image input support#232

Merged
Anush008 merged 3 commits intoAnush008:mainfrom
cornmander:codex/qwen3-vl-embedding-2b-image
Mar 5, 2026
Merged

Add Qwen3-VL-Embedding-2B image input support#232
Anush008 merged 3 commits intoAnush008:mainfrom
cornmander:codex/qwen3-vl-embedding-2b-image

Conversation

@cornmander
Copy link
Copy Markdown
Contributor

Summary

  • add a new Qwen3VLEmbedding API for Qwen3-VL multimodal embedding models
  • support image inputs via embed_images(...) and embed_image_bytes(...)
  • keep text support for Qwen3-VL via embed_texts(...) and existing Qwen3TextEmbedding
  • add an internal Qwen3-VL vision module for image token embedding/injection
  • update exports, feature wiring, README examples, and qwen3 tests

Implementation notes

  • uses the qwen3 feature (candle backend) and enables dep:image for this feature
  • image preprocessing follows Qwen3-VL patch/grid behavior (patch_size, merge_size, temporal duplication)
  • prompt expansion replaces a single <|image_pad|> placeholder with the exact number of image patch tokens
  • image embeddings are injected into the text stream before final hidden-state pooling

Validation

  • cargo fmt --all
  • cargo check
  • cargo test --features qwen3 models::qwen3::tests -- --nocapture
  • cargo test --features qwen3 --test qwen3 qwen3_vl_2b_text_embed -- --nocapture
  • cargo test --features qwen3 --test qwen3 qwen3_vl_2b_image_embed -- --nocapture

New/updated interfaces

  • Qwen3VLEmbedding::from_hf(...)
  • Qwen3VLEmbedding::embed_texts(...)
  • Qwen3VLEmbedding::embed_images(...)
  • Qwen3VLEmbedding::embed_image_bytes(...)

@cornmander
Copy link
Copy Markdown
Contributor Author

Pushed a follow-up formatting fix in 7894acd for the cargo fmt --all -- --check failure (import wrapping in src/models/qwen3.rs).

Current run status is action_required with no jobs started yet, so this PR likely needs a maintainer to approve workflow execution for the forked branch before CI can run again.

Copy link
Copy Markdown
Owner

@Anush008 Anush008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to contribute @cornmander

let images = ["tests/assets/image_0.png", "tests/assets/image_1.png"];
let embeddings = model.embed_images(&images).expect("embed images");

assert_eq!(embeddings.len(), images.len());
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add assertions equating embedding values from the Python counterpart code.

Like we do at https://github.com/Anush008/fastembed-rs/blob/main/tests/text-embeddings.rs

We have ensure Python and Rust produce the same vectors.

@cornmander
Copy link
Copy Markdown
Contributor Author

Addressed the review feedback in a5c7c2f.

Changes made:

  • Added Python-reference assertions in tests/qwen3.rs for qwen3_vl_2b_image_embed (embedding sums, first dims, and cosine).
  • Aligned Rust VL image path with the official Python behavior:
    • Python-compatible ties-to-even image resize rounding.
    • MRoPE position-id construction for image tokens.
    • Interleaved MRoPE rotary application from mrope_section.
    • DeepStack visual feature injection into early decoder layers.

Validation run:

  • RUN_QWEN3_VL_2B_IMAGE=1 cargo test --features qwen3 --test qwen3 qwen3_vl_2b_image_embed -- --nocapture
  • RUN_QWEN3_VL_2B=1 cargo test --features qwen3 --test qwen3 qwen3_vl_2b_text_embed -- --nocapture

Copy link
Copy Markdown
Owner

@Anush008 Anush008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cornmander

@Anush008 Anush008 merged commit b9a6280 into Anush008:main Mar 5, 2026
1 check passed
github-actions bot pushed a commit that referenced this pull request Mar 5, 2026
## [5.12.0](v5.11.0...v5.12.0) (2026-03-05)

### 🍕 Features

* Add Qwen3-VL-Embedding-2B image input support ([#232](#232)) ([b9a6280](b9a6280))
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

🎉 This PR is included in version 5.12.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants