FEAT: auto batch embedding by qinxuye · Pull Request #4197 · xorbitsai/inference

qinxuye · 2025-11-02T16:29:04Z

Before this PR, embedding is sequential, though users could create multiple embeddings at same time.

After this PR, the model class could inherit BatchMixin, and provides method with batch version (utilizing xoscar batch API), the request will be put into queue, and a background coroutine will collect items as many as possible and call API in a single call.

This is an initial version of auto batching, actually, this could be applied to all models, not only for auto-regressive models(basically LLM).

Fixes #4123

qinxuye · 2025-11-02T16:32:02Z

@llyycchhee please help me check this PR, and see if there's anything that can be improved.

Benchmarks welcome.

qinxuye · 2025-11-10T15:04:00Z

Can you paste some benchmark to illustrate the result? @llyycchhee

xinference/deploy/local.py

xinference/constants.py

xinference/core/tests/test_restful_api.py

xinference/model/embedding/llama_cpp/core.py

xinference/model/embedding/sentence_transformers/core.py

xinference/model/embedding/vllm/core.py

qinxuye · 2025-11-12T02:49:07Z

256 queries' benchmark.

The improvements are huge.

llyycchhee

LGTM

qinxuye added 3 commits October 31, 2025 18:24

FEAT: support batch embedding

281996f

FEAT: auto batch embedding

d2dbca9

optimize batch method

dabb6ce

XprobeBot added the feature label Nov 2, 2025

XprobeBot added this to the v1.x milestone Nov 2, 2025

FEAT: abstract func & set env

473a701

llyycchhee added 3 commits November 10, 2025 17:19

feat(embedding): modify testcase

67043b9

feat(embedding): modify testcase

9607ff8

feat(embedding): modify testcase

dba3990

qinxuye commented Nov 11, 2025

View reviewed changes

llyycchhee added 2 commits November 11, 2025 10:35

feat(embedding): modify by comments

b1c1efa

feat(embedding): modify by comments

b80ca0d

feat(embedding): adapt py39 asyncio

613487d

llyycchhee approved these changes Nov 12, 2025

View reviewed changes

qinxuye merged commit c550277 into xorbitsai:main Nov 12, 2025
11 of 14 checks passed

qinxuye deleted the feat/batch branch November 12, 2025 11:09

ZhikaiGuo960110 mentioned this pull request Dec 29, 2025

【CPU方式加载推理模型】xinferece vllm 跑emb 性能和vllm serve 原生相比，【并发1】要低 #4418

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: auto batch embedding#4197

FEAT: auto batch embedding#4197
qinxuye merged 10 commits intoxorbitsai:mainfrom
qinxuye:feat/batch

qinxuye commented Nov 2, 2025 •

edited

Loading

Uh oh!

qinxuye commented Nov 2, 2025 •

edited

Loading

Uh oh!

qinxuye commented Nov 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qinxuye commented Nov 12, 2025

Uh oh!

llyycchhee left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qinxuye commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinxuye commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinxuye commented Nov 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qinxuye commented Nov 12, 2025

Uh oh!

llyycchhee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qinxuye commented Nov 2, 2025 •

edited

Loading

qinxuye commented Nov 2, 2025 •

edited

Loading