cld2labs/HybridSearch by arpannookala-12 · Pull Request #74 · opea-project/Enterprise-Inference

arpannookala-12 · 2026-03-10T23:05:44Z

This is a straightforward initial addition of the full HybridSearch sample — five microservices (gateway, embedding, retrieval, ingestion, LLM), a Streamlit UI, docker-compose, and supporting scripts/docs.

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

sample_solutions/HybridSearch/README.md

Address PR review comments: correct the git clone URL to opea-project/Enterprise-Inference, align model configuration with .env.example, and add a prerequisite section listing required models. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Use consistent `docker compose` (not `docker-compose`) and list log commands for all individual services for thoroughness. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

arpannookala-12 · 2026-03-27T00:32:30Z

Addressed all 4 review comments:

Repo URL — Updated clone URL to opea-project/Enterprise-Inference with correct cd path
Model config — Aligned model names with .env.example (bge-base-en-v1.5, bge-reranker-base, Qwen3-4B-Instruct-2507) and added a "Required Models" prerequisite section
docker compose — Changed docker-compose to docker compose for consistency
Per-service logs — Added individual log commands for all services (gateway, embedding, retrieval, llm, ingestion, ui)

alexsin368 · 2026-03-27T21:13:04Z

@arpannookala-12 please resolve my comments when you address them. Also, I'm encountering an error with the ingestion service even though my embedding model is deployed and functional. I deployed EI with Keycloak. It seems there is an issue with the URL to the model endpoint, which should be https://api.example.com/bge-base-en-v1.5/v1/embeddings

arpannookala-12 · 2026-03-27T21:55:23Z

@alexsin368 sure will be sure to resolve the comments as I address them, working on understanding the issue with the keycloak EI deployment, will get back on this

Add EMBEDDING_API_ENDPOINT, RERANKER_API_ENDPOINT, and LLM_API_ENDPOINT config vars so each service can target its own APISIX route. When set, the service uses the per-model URL; when unset, it falls back to GENAI_GATEWAY_URL for GenAI Gateway compatibility. Consistent with the pattern used by RAGChatbot and other sample solutions. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

arpannookala-12 · 2026-03-27T23:54:13Z

@alexsin368 I added some config support and api_client changes and I tested again by setting the following values

Please note that the reranker model was not deployed and the value for that is a placeholder, but the conclusion is that the embedding model worked with this effort

sample_solutions/HybridSearch/reranker-configuration.md

sample_solutions/HybridSearch/.env.example

sample_solutions/HybridSearch/api/retrieval/api_client.py

…n.md - api_client.py: Remove /v1 from reranker URL (TEI uses /rerank, not /v1/rerank); add model name to rerank payload per TEI API requirements - reranker-configuration.md: Scope guide to Xeon-only deployments with a note that Gaudi/TEI works out of the box; remove spurious :4000 port from BASE_URL; add TOKEN variable setup and replace literal "Token" with ${TOKEN} in all curl commands Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

alexsin368 · 2026-04-02T21:43:34Z

Functional with EI is deployed with GenAI Gateway. For keycloak deployment, still having an issue with the reranker. It seems the data it looks for is "texts", instead of "documents" because the bge-reranker-base model is deployed with vLLM on Gaudi. A fix would be to look for "texts" if EI is deployed with keycloak.

sample_solutions/HybridSearch/reranker-configuration.md

Gaudi (TEI) serves endpoints without /v1 prefix (/embeddings, /rerank) while Xeon (vLLM) uses the /v1 prefix (/v1/embeddings, /v1/rerank). - Add INFERENCE_BACKEND=vllm|tei to all three config.py files - Update embedding, retrieval, and llm api_client.py to branch URL construction based on INFERENCE_BACKEND - Pass INFERENCE_BACKEND through docker-compose.yml for all three services - Add INFERENCE_BACKEND to .env.example with hardware guidance - Scope reranker-configuration.md to GenAI Gateway + Xeon only - Update README to reflect GenAI Gateway + Xeon scope and note that Keycloak tokens can be configured for longer TTL in Keycloak console Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

arpannookala-12 · 2026-04-03T20:57:55Z

@alexsin368 — thanks for the findings on Gaudi + GenAI Gateway. Based on your testing, here's the full picture we've now confirmed:

Enterprise Inference	Hardware	Reranker config required	Code change needed
GenAI Gateway	Xeon	Yes (LiteLLM model update)	None — `/v1/rerank` works
GenAI Gateway	Gaudi	No	`INFERENCE_BACKEND=tei`
Keycloak / APISIX	Xeon	No	None
Keycloak / APISIX	Gaudi	No	None

Root cause: Gaudi uses the TEI (Text Embeddings Inference) backend which serves endpoints without the /v1 prefix (/embeddings, /rerank). Xeon uses vLLM which requires the /v1 prefix.

Fix added in this commit: A new INFERENCE_BACKEND env var (vllm by default, set to tei for Gaudi). When tei, all three services (embedding, retrieval, LLM) drop the /v1 prefix from their endpoint URLs.

For Gaudi + GenAI Gateway users, add this to .env:

INFERENCE_BACKEND=tei

All other deployments (Xeon, Keycloak/APISIX) keep the default INFERENCE_BACKEND=vllm and are unaffected.

sample_solutions/HybridSearch/README.md

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

When LLM_API_ENDPOINT is set (APISIX/Keycloak), always keep /v1 prefix regardless of INFERENCE_BACKEND. Only drop /v1 for GenAI Gateway + Gaudi where LiteLLM itself handles the routing without the /v1 prefix. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

arpannookala-12 · 2026-04-06T17:42:19Z

Reranker fixes: batching + token overflow (latest commit `a7d599e`)

Two root causes were found and fixed for the 500 errors seen when uploading large documents:

1. Batch size overflow (413 Payload Too Large)

TOP_K_FUSION=50 was sending all 50 fusion candidates in a single rerank request. bge-reranker-base has a max batch size of 32 — anything over that returned a 413.

Fix: Added RERANKER_MAX_BATCH_SIZE config (default 32) to retrieval/config.py and docker-compose.yml. rerank_pairs() now loops over batches of that size, tracking index offsets so scores are written back to the correct positions in the full result list.

2. Token length overflow (500 EngineCore)

After the batch fix, batch 1 (32 docs) succeeded but batch 2 consistently returned 500 EngineCore encountered an issue. Root cause: bge-reranker-base has a 512-token max sequence length for query + document combined. Technical document chunks (code, numbers, punctuation) tokenize at ~2 chars/token in the worst case — at 1000-char truncation, batch 2 docs were pushing past the limit.

Fix: Truncate each document to 500 chars (~125 tokens) before sending to the reranker. This leaves safe headroom for the query and worst-case tokenization while staying within the model's limit. Tested at 1000 chars — batch 2 still failed. 500 chars — both batches return 200 OK consistently.

On quality: 500 chars captures the leading context of each chunk, which is what cross-encoders weight most heavily for relevance scoring. The alternative (intermittent 500s on certain document types) is a harder quality regression.

Both fixes are tested end-to-end on Xeon + GenAI Gateway and Xeon + Keycloak with an 8MB document upload.

Two issues were causing 500 errors when reranking over large uploads: 1. Batch size overflow (413): TOP_K_FUSION=50 sent all 50 candidates in a single rerank request, exceeding bge-reranker-base's max batch size. Fixed by adding RERANKER_MAX_BATCH_SIZE config (default 32) and looping over batches in rerank_pairs(). Index offsets are tracked so scores are written back to the correct positions in the full list. 2. Token length overflow (500 EngineCore): Technical document chunks tokenize at ~2 chars/token in worst case. At 1000-char truncation some docs in batch 2 exceeded the model's 512-token max sequence length (query + doc combined). Reduced truncation to 500 chars (~125 tokens), leaving safe headroom for the query and worst-case tokenization while preserving the leading context most relevant for reranking quality. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Clarify that MODEL_ENDPOINT values differ by deployment type: - Xeon + Keycloak/APISIX: APISIX route name with -vllmcpu suffix (e.g. bge-base-en-v1.5-vllmcpu, bge-reranker-base-vllmcpu) - Xeon + GenAI Gateway / Gaudi: HuggingFace model ID Update APISIX endpoint URL examples in .env.example to use -vllmcpu route names. Add deployment-type comparison table to README Configure Models section. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

sample_solutions/HybridSearch/api/ingestion/main.py

sample_solutions/HybridSearch/api/ingestion/config.py

sample_solutions/HybridSearch/docker-compose.yml

sample_solutions/HybridSearch/.env.example

alexsin368

suggested changes for user to adjust embedding batch size for embedding and ingestion services. ensures same value is used in both

…c fixes api_client.py (retrieval): - Separate rerank payload by backend: Keycloak/APISIX uses "texts", GenAI Gateway uses "documents" — each backend expects its own field - Add logger.info for raw reranker response per batch - Clarify response format comments (Format 1 vs Format 2) ingestion/config.py + main.py: - Add embedding_batch_size config (default 32, must match embedding service) - Use settings.embedding_batch_size instead of hardcoded 32 in main.py - Log the batch size at start of embedding loop docker-compose.yml + .env.example: - Pass EMBEDDING_BATCH_SIZE to ingestion service so users can tune it - Add EMBEDDING_BATCH_SIZE to .env.example with note to reduce for larger documents reranker-configuration.md: - Step 2: clarify TOKEN source (GenAI Gateway vault.yml, not Keycloak) - Step 2: define BASE_URL with /v1 path so curl commands use /rerank - Steps 3 + 7: update curl to use ${BASE_URL}/rerank - Step 3: add note on "documents" vs "texts" field by deployment type - Step 7: add Keycloak/APISIX response format (flat array) alongside GenAI Gateway format (nested results) README.md: - Replace docker-compose with docker compose throughout - Expand log-checking section with per-service startup verification commands Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

BASE_URL must remain without /v1 because Steps 4 and 5 use the same variable for LiteLLM admin endpoints (/model/info, /model/update) which have no /v1 prefix. The inference curl commands correctly use ${BASE_URL}/v1/rerank explicitly. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Prevents bandit from scanning the HybridSearch dataset venv which causes internal errors on Python 3.14 bytecode files. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

Adds Trivy (vuln/misconfig/secret), Bandit, and ShellCheck scans scoped to the HybridSearch sample solution. Runs on PR open/sync and push to main/dev, with workflow_dispatch support for manual PR scans. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

This reverts commit 33f85a1. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

GitHub Actions only picks up workflows from .github/workflows at the repository root. Moves the SDLE scan workflow out of the sample_solutions/HybridSearch subdirectory so it runs correctly. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

alexsin368 · 2026-04-07T21:29:51Z

@arpannookala-12 all security scans passed. you may remove code-scans.yaml.

All Trivy, Bandit, and ShellCheck scans passed successfully. Removing the workflow file as it is no longer needed on this branch. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

arpannookala-12 · 2026-04-07T21:47:59Z

@alexsin368 done, the code-scans file has been removed

Add HybridSearch sample solution

824732b

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

arpannookala-12 force-pushed the cld2labs/HybridSearch branch from 1435fe8 to 824732b Compare March 10, 2026 23:07

arpannookala-12 changed the title ~~Add HybridSearch sample solution~~ cld2labs:cld2labs/HybridSearch Mar 10, 2026

arpannookala-12 changed the title ~~cld2labs:cld2labs/HybridSearch~~ cld2labs/HybridSearch Mar 11, 2026

alexsin368 self-requested a review March 20, 2026 23:04

alexsin368 reviewed Mar 27, 2026

View reviewed changes

sample_solutions/HybridSearch/README.md Outdated Show resolved Hide resolved

alexsin368 reviewed Mar 27, 2026

View reviewed changes

sample_solutions/HybridSearch/README.md Outdated Show resolved Hide resolved

alexsin368 reviewed Mar 27, 2026

View reviewed changes

sample_solutions/HybridSearch/README.md Outdated Show resolved Hide resolved

alexsin368 reviewed Mar 27, 2026

View reviewed changes

sample_solutions/HybridSearch/README.md Show resolved Hide resolved

arpannookala-12 added 2 commits March 26, 2026 19:25

Fix docker compose command and add per-service log instructions

d7c6ae9

Use consistent `docker compose` (not `docker-compose`) and list log commands for all individual services for thoroughness. Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

alexsin368 reviewed Mar 31, 2026

View reviewed changes

sample_solutions/HybridSearch/reranker-configuration.md Outdated Show resolved Hide resolved

alexsin368 reviewed Mar 31, 2026

View reviewed changes

sample_solutions/HybridSearch/reranker-configuration.md Outdated Show resolved Hide resolved

alexsin368 reviewed Mar 31, 2026

View reviewed changes

sample_solutions/HybridSearch/reranker-configuration.md Outdated Show resolved Hide resolved

alexsin368 reviewed Mar 31, 2026

View reviewed changes

sample_solutions/HybridSearch/reranker-configuration.md Outdated Show resolved Hide resolved

alexsin368 reviewed Mar 31, 2026

View reviewed changes

sample_solutions/HybridSearch/.env.example Show resolved Hide resolved

alexsin368 reviewed Apr 1, 2026

View reviewed changes

sample_solutions/HybridSearch/api/retrieval/api_client.py Outdated Show resolved Hide resolved

alexsin368 reviewed Apr 1, 2026

View reviewed changes

sample_solutions/HybridSearch/api/retrieval/api_client.py Outdated Show resolved Hide resolved

alexsin368 reviewed Apr 1, 2026

View reviewed changes

sample_solutions/HybridSearch/api/retrieval/api_client.py Outdated Show resolved Hide resolved

alexsin368 reviewed Apr 2, 2026

View reviewed changes

sample_solutions/HybridSearch/reranker-configuration.md Show resolved Hide resolved

alexsin368 reviewed Apr 2, 2026

View reviewed changes

sample_solutions/HybridSearch/reranker-configuration.md Show resolved Hide resolved

alexsin368 reviewed Apr 2, 2026

View reviewed changes

sample_solutions/HybridSearch/reranker-configuration.md Show resolved Hide resolved

alexsin368 reviewed Apr 2, 2026

View reviewed changes

sample_solutions/HybridSearch/reranker-configuration.md Outdated Show resolved Hide resolved

alexsin368 reviewed Apr 3, 2026

View reviewed changes

sample_solutions/HybridSearch/README.md Show resolved Hide resolved

arpannookala-12 added 2 commits April 3, 2026 16:16

Add INFERENCE_BACKEND note to README model config section

c3e92ef

Signed-off-by: arpannookala-12 <ganesh.arpan.nookala@cloud2labs.com>

arpannookala-12 force-pushed the cld2labs/HybridSearch branch from a7d599e to 5524187 Compare April 6, 2026 17:43