Router Microservice by madison-evans · Pull Request #1563 · opea-project/GenAIComps

madison-evans · 2025-04-14T14:49:44Z

Description
This PR adds a new modular component (router) to the GenAIComps infrastructure. The router is designed to direct prompts to different downstream LLM endpoints based on prompt complexity or semantic characteristics.

The router supports multiple controller instances:

RouteLLM: a matrix factorization-based router trained on preference-annotated datasets (e.g., gpt4_judge_battles)

Semantic Router: an embedding similarity-based router for simple threshold-based prompt classification

Configuration is centralized via config.yaml and mounted per controller. The router is deployable via both Docker Compose and Kubernetes.

Issues
n/a — this is a new component addition

Type of change
New feature (non-breaking change which adds new functionality)

Others (enhancement, validation, modularity)

Dependencies
Adds controller-specific config YAMLs: routellm_config.yaml, semantic_router_config.yaml

Uses existing base dependencies (pydantic, fastapi, etc.) already supported in the project

Optionally depends on access to HuggingFace embeddings and OpenAI APIs, via secrets

Signed-off-by: Ruoyu Ying <ruoyu.ying@intel.com> Co-authored-by: sdp <sdp@b49691d6a5d8.jf.intel.com>

* Refine clip embedding Signed-off-by: lvliang-intel <liang1.lv@intel.com>

… licenses (opea-project#1247) Signed-off-by: Patil, Jitendra <jitendra.patil@intel.com>

* Fix bug iin HuggingFaceEndpoint usage 1. Upgrade langchain hugginface from community to partner (community deprecated) Added task=text-generation argument to fix error with tgi_endpoint Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>

Signed-off-by: dmsuehir <dina.s.jones@intel.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

* Create token metrics only when they are available This avoids generation of useless token/request histogram metrics for services that use Orchestrator class, but never call its token processing functionality. (Helps in differentiating frontend megaservice metrics from backend megaservice ones, especially when multiple OPEA applications run in the same cluster.) Also change Orchestrator CI test workaround to use unique prefix for each metric instance, instead of metrics being (singleton) class variables. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * Add locking for latency metric creation / method change As that that could be called from multiple request handling threads. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> --------- Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>

Signed-off-by: minmin-intel <minmin.hou@intel.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>

1.Fix template name in README 2.Fix invalid release name Signed-off-by: ZePan110 <ze.pan@intel.com>

* fix web-retrievers hub client and tei endpoint issue Signed-off-by: Spycsh <sihan.chen@intel.com>

…cards for LLM microservice (opea-project#1267) * Update README.md for Deepseek support and numbers of required gaudi cards Signed-off-by: Tsai, Louie <louie.tsai@intel.com> * Update README.md Signed-off-by: Tsai, Louie <louie.tsai@intel.com> --------- Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

* working README for CLI and compose Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * update for direct python execution Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * fix formatting Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bring back depends_on condition Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> --------- Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>

* Fix Dataprep Ingest Data Issue. Trace: 1. The update of `langchain_huggingface.HuggingFaceEndpointEmbeddings` caused the wrong size of embedding vectors. 2. Wrong size vectors are wrongly saved into Redis database, and the indices are not created correctly. 3. The retriever can not retrieve data from Redis using index due to the reasons above. 4. Then the RAG seems `not work`, for the file uploaded can not be retrieved from database. Solution: Replace all of the `langchain_huggingface.HuggingFaceEndpointEmbeddings` to `langchain_community.embeddings.HuggingFaceInferenceAPIEmbeddings`, and modify related READMEs and scirpts. Related issue: - opea-project/GenAIExamples#1473 - opea-project/GenAIExamples#1482 --------- Signed-off-by: letonghan <letong.han@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…um, remove vllm hpu triton version fix (opea-project#1275) * Fix langchain and huggingface version to avoid bug Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

Signed-off-by: chensuyue <suyue.chen@intel.com>

…oject#1280) Signed-off-by: Spycsh <sihan.chen@intel.com>

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Fix VDMS retrieval issue Signed-off-by: lvliang-intel <liang1.lv@intel.com>

…ge names with examples (opea-project#1284) Align mongo related chathistory/feedbackmanagement/promptregistry image names with examples Signed-off-by: Spycsh <sihan.chen@intel.com> Co-authored-by: Liang Lv <liang1.lv@intel.com>

…ent passing (opea-project#1288) * Fix dataprpe ingest error handling and skip_ingestion argument passing in dataprep neo4j integration Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com>

* first code for multi-turn Signed-off-by: minmin-intel <minmin.hou@intel.com> * test redispersistence Signed-off-by: minmin-intel <minmin.hou@intel.com> * integrate persistent store in react llama Signed-off-by: minmin-intel <minmin.hou@intel.com> * test multi-turn Signed-off-by: minmin-intel <minmin.hou@intel.com> * multiturn for assistants api and chatcompletion api Signed-off-by: minmin-intel <minmin.hou@intel.com> * update readme and ut script Signed-off-by: minmin-intel <minmin.hou@intel.com> * update readme and ut scripts Signed-off-by: minmin-intel <minmin.hou@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug Signed-off-by: minmin-intel <minmin.hou@intel.com> * change memory type naming Signed-off-by: minmin-intel <minmin.hou@intel.com> * fix with_memory as str Signed-off-by: minmin-intel <minmin.hou@intel.com> --------- Signed-off-by: minmin-intel <minmin.hou@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix the retriever issue of Milvus DB that data can not be retrieved after ingested using dataprep. Signed-off-by: letonghan <letong.han@intel.com> --------- Signed-off-by: letonghan <letong.han@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix Qdrant retriever no retrieved result issue. Signed-off-by: letonghan <letong.han@intel.com>

Get the latest vLLM stable version. Signed-off-by: Wang, Xigui <xigui.wang@intel.com>

…#1290) * Fix telemetry connection issue when disabling telemetry - use ENABLE_OPEA_TELEMETRY to control whether to enable open telemetry, default false - fix the issue that logs always show telemetry connection error with each request when telemetry is disabled - ban the above error propagation to microservices when telemetry is disabled Signed-off-by: Spycsh <sihan.chen@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix ut failure where required the flag to be on * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Spycsh <sihan.chen@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…opea-project#1298) * Refine CLI embedding microservice using dependency Signed-off-by: lvliang-intel <liang1.lv@intel.com>

1. set default session_id for react_langchain strategy, because the langchain version upgrade. 2. fix request message format

Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: Letong Han <106566639+letonghan@users.noreply.github.com>

Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Liang Lv <liang1.lv@intel.com>

* Refine dataprep Milvus CI Signed-off-by: letonghan <letong.han@intel.com>

… place of local config files

… 'semantic_router' module

…o router

Signed-off-by: madison-evans <madison.evans@intel.com>

…controller on runtime. Signed-off-by: madison-evans <madison.evans@intel.com>

Signed-off-by: madison-evans <madison.evans@intel.com>

…ntroller. Now switchable

…pose

…o router

Ruoyu-y and others added 30 commits January 28, 2025 16:14

doc: fix minor issue in vllm doc (opea-project#1242)

bea2e00

Signed-off-by: Ruoyu Ying <ruoyu.ying@intel.com> Co-authored-by: sdp <sdp@b49691d6a5d8.jf.intel.com>

Refine CLIP embedding (opea-project#1245)

45c5867

* Refine clip embedding Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Update LEGAL_INFORMATION.md about software subject to non-open source…

5e7ae79

… licenses (opea-project#1247) Signed-off-by: Patil, Jitendra <jitendra.patil@intel.com>

Fix port in the data prep redis README file (opea-project#1250)

1a2c385

Signed-off-by: dmsuehir <dina.s.jones@intel.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>

Add Dockerfile for comps-base image (opea-project#1127)

b5bba40

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

fix tei embedding and tei reranking bug (opea-project#1256)

4ad2755

Signed-off-by: minmin-intel <minmin.hou@intel.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>

Fix CD test issue. (opea-project#1263)

b5925fa

1.Fix template name in README 2.Fix invalid release name Signed-off-by: ZePan110 <ze.pan@intel.com>

Fix web-retrievers hub client and tei endpoint issue (opea-project#1270)

1ed2f23

* fix web-retrievers hub client and tei endpoint issue Signed-off-by: Spycsh <sihan.chen@intel.com>

Fix langchain and huggingface version to avoid bug in FaqGen and DocS…

b307cc8

…um, remove vllm hpu triton version fix (opea-project#1275) * Fix langchain and huggingface version to avoid bug Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

update default service list (opea-project#1276)

7209d58

Signed-off-by: chensuyue <suyue.chen@intel.com>

fix metric id issue when init multiple Orchestrator instance (opea-pr…

3d08a3a

…oject#1280) Signed-off-by: Spycsh <sihan.chen@intel.com>

Fix Build latest images on push event workflow (opea-project#1282)

527b2c5

Signed-off-by: ZePan110 <ze.pan@intel.com>

Fix VDMS retrieval issue (opea-project#1252)

34163bb

* Fix VDMS retrieval issue Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Bug Fix neo4j dataprep ingest error handling and skip_ingestion argum…

5476aab

…ent passing (opea-project#1288) * Fix dataprpe ingest error handling and skip_ingestion argument passing in dataprep neo4j integration Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com>

Fix Qdrant retriever RAG issue. (opea-project#1289)

ee84222

* Fix Qdrant retriever no retrieved result issue. Signed-off-by: letonghan <letong.han@intel.com>

update vLLM CPU to latest tag (opea-project#1285)

59b8a04

Get the latest vLLM stable version. Signed-off-by: Wang, Xigui <xigui.wang@intel.com>

Refine CLIP embedding microservice by leveraging the third-party CLIP (…

1268584

…opea-project#1298) * Refine CLI embedding microservice using dependency Signed-off-by: lvliang-intel <liang1.lv@intel.com>

fix agent message format. (opea-project#1297)

8e068c7

1. set default session_id for react_langchain strategy, because the langchain version upgrade. 2. fix request message format

Fix milvus dataprep ingest files failure (opea-project#1299)

1e3316a

Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: Letong Han <106566639+letonghan@users.noreply.github.com>

Bump transformers (opea-project#1278)

294a906

Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Liang Lv <liang1.lv@intel.com>

Refine dataprep test scripts (opea-project#1305)

4d2be60

* Refine dataprep Milvus CI Signed-off-by: letonghan <letong.han@intel.com>

madison-evans added 25 commits May 12, 2025 12:41

controllers directory copied over to integrations directory

93571ac

initial Dockerfile created

f7c4115

opea_router_microservice.py code

f8b58fc

README for src, compose.yaml, and kubernetes yaml files established

daaa165

added readme for the kubernetes subdirectory

ad09f61

created routellm-configmap.yaml and semantic-router-configmap.yaml in…

1f07487

… place of local config files

added deployment script for docker_compose test scripts

19ec62b

added missing modules to requirements.txt and fixed some bugs

fc4ffcf

modified python to use full paths in imports

ea0cb24

updated absolute paths in imports and directory names for conflicting…

cde37c9

… 'semantic_router' module

added test script and aligned src python files

9a1bf09

update to semantic-router-configmap.yaml

b2fc5d4

added TODOs

024f07b

modified files for configmap in k8s directory

1fc69f3

updated absolute paths in module imports

1f4da32

Merge branch 'router' of https://github.com/SAPD-Intel/GenAIComps int…

a8295f6

…o router

removed kubernetes directory from deploymenet directory of router comp

853ae89

Signed-off-by: madison-evans <madison.evans@intel.com>

updated controller type to use env variable

7900f94

Signed-off-by: madison-evans <madison.evans@intel.com>

: now uses both config path but using environment variable to switch …

e4dc129

…controller on runtime. Signed-off-by: madison-evans <madison.evans@intel.com>

: Make encoder name configurable

612155e

Signed-off-by: madison-evans <madison.evans@intel.com>

OPEA telemetry added to both routellm and semantic_router controllers

b99b026

Signed-off-by: madison-evans <madison.evans@intel.com>

routellm_config.yaml update

9205345

Make embedding model configurable and support HF/OpenAI in RouteLLMCo…

0d77831

…ntroller. Now switchable

Allow endpoint, model_id & controller overrides via env in docker-com…

170a9b1

…pose

Merge branch 'router' of https://github.com/SAPD-Intel/GenAIComps int…

6bc8429

…o router

madison-evans closed this May 12, 2025

madison-evans force-pushed the router branch from c3d5123 to 6bc8429 Compare May 12, 2025 14:43

This was referenced May 19, 2025

Routing Service #1709

Closed

Routing service #1716

Merged

joshuayao removed a link to an issue May 27, 2025

[Feature] RouteLLM #936

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Router Microservice#1563

Router Microservice#1563
madison-evans wants to merge 998 commits intoopea-project:mainfrom
SAPD-Intel:router

madison-evans commented Apr 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

madison-evans commented Apr 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants