Skip to content

Router Microservice#1563

Closed
madison-evans wants to merge 998 commits intoopea-project:mainfrom
SAPD-Intel:router
Closed

Router Microservice#1563
madison-evans wants to merge 998 commits intoopea-project:mainfrom
SAPD-Intel:router

Conversation

@madison-evans
Copy link
Copy Markdown
Contributor

Description
This PR adds a new modular component (router) to the GenAIComps infrastructure. The router is designed to direct prompts to different downstream LLM endpoints based on prompt complexity or semantic characteristics.

The router supports multiple controller instances:

RouteLLM: a matrix factorization-based router trained on preference-annotated datasets (e.g., gpt4_judge_battles)

Semantic Router: an embedding similarity-based router for simple threshold-based prompt classification

Configuration is centralized via config.yaml and mounted per controller. The router is deployable via both Docker Compose and Kubernetes.

Issues
n/a — this is a new component addition

Type of change
New feature (non-breaking change which adds new functionality)

Others (enhancement, validation, modularity)

Dependencies
Adds controller-specific config YAMLs: routellm_config.yaml, semantic_router_config.yaml

Uses existing base dependencies (pydantic, fastapi, etc.) already supported in the project

Optionally depends on access to HuggingFace embeddings and OpenAI APIs, via secrets

Ruoyu-y and others added 30 commits January 28, 2025 16:14
Signed-off-by: Ruoyu Ying <ruoyu.ying@intel.com>
Co-authored-by: sdp <sdp@b49691d6a5d8.jf.intel.com>
* Refine clip embedding

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
… licenses (opea-project#1247)

Signed-off-by: Patil, Jitendra <jitendra.patil@intel.com>
* Fix bug iin HuggingFaceEndpoint usage

	1. Upgrade langchain hugginface from community to partner (community deprecated)
Added task=text-generation argument to fix error with tgi_endpoint

Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
* Create token metrics only when they are available

This avoids generation of useless token/request histogram metrics
for services that use Orchestrator class, but never call its token
processing functionality.

(Helps in differentiating frontend megaservice metrics from backend
megaservice ones, especially when multiple OPEA applications run in
the same cluster.)

Also change Orchestrator CI test workaround to use unique prefix for
each metric instance, instead of metrics being (singleton) class
variables.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

* Add locking for latency metric creation / method change

As that that could be called from multiple request handling threads.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

---------

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>
Signed-off-by: minmin-intel <minmin.hou@intel.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
1.Fix template name in README
2.Fix invalid release name

Signed-off-by: ZePan110 <ze.pan@intel.com>
* fix web-retrievers hub client and tei endpoint issue

Signed-off-by: Spycsh <sihan.chen@intel.com>
…cards for LLM microservice (opea-project#1267)

* Update README.md for Deepseek support and numbers of required gaudi cards

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

* Update README.md

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

---------

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
* working README for CLI and compose

Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com>

* update for direct python execution

Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com>

* fix formatting

Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bring back depends_on condition

Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com>

---------

Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
* Fix Dataprep Ingest Data Issue.

Trace:
1. The update of `langchain_huggingface.HuggingFaceEndpointEmbeddings` caused the wrong size of embedding vectors.
2. Wrong size vectors are wrongly saved into Redis database, and the indices are not created correctly.
3. The retriever can not retrieve data from Redis using index due to the
   reasons above.
4. Then the RAG seems `not work`, for the file uploaded can not be
   retrieved from database.

Solution:
Replace all of the `langchain_huggingface.HuggingFaceEndpointEmbeddings`
to `langchain_community.embeddings.HuggingFaceInferenceAPIEmbeddings`,
and modify related READMEs and scirpts.

Related issue: 
- opea-project/GenAIExamples#1473
- opea-project/GenAIExamples#1482

---------

Signed-off-by: letonghan <letong.han@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…um, remove vllm hpu triton version fix (opea-project#1275)

* Fix langchain and huggingface version to avoid bug

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
* Fix VDMS retrieval issue
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
…ge names with examples (opea-project#1284)

Align mongo related chathistory/feedbackmanagement/promptregistry image names with examples 

Signed-off-by: Spycsh <sihan.chen@intel.com>
Co-authored-by: Liang Lv <liang1.lv@intel.com>
…ent passing (opea-project#1288)

* Fix dataprpe ingest error handling and skip_ingestion argument passing in dataprep neo4j integration
Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com>
* first code for multi-turn

Signed-off-by: minmin-intel <minmin.hou@intel.com>

* test redispersistence

Signed-off-by: minmin-intel <minmin.hou@intel.com>

* integrate persistent store in react llama

Signed-off-by: minmin-intel <minmin.hou@intel.com>

* test multi-turn

Signed-off-by: minmin-intel <minmin.hou@intel.com>

* multiturn for assistants api and chatcompletion api

Signed-off-by: minmin-intel <minmin.hou@intel.com>

* update readme and ut script

Signed-off-by: minmin-intel <minmin.hou@intel.com>

* update readme and ut scripts

Signed-off-by: minmin-intel <minmin.hou@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug

Signed-off-by: minmin-intel <minmin.hou@intel.com>

* change memory type naming

Signed-off-by: minmin-intel <minmin.hou@intel.com>

* fix with_memory as str

Signed-off-by: minmin-intel <minmin.hou@intel.com>

---------

Signed-off-by: minmin-intel <minmin.hou@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix the retriever issue of Milvus DB that data can not be retrieved
after ingested using dataprep.

Signed-off-by: letonghan <letong.han@intel.com>

---------

Signed-off-by: letonghan <letong.han@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix Qdrant retriever no retrieved result issue.
Signed-off-by: letonghan <letong.han@intel.com>
Get the latest vLLM stable version.
Signed-off-by: Wang, Xigui <xigui.wang@intel.com>
…#1290)

* Fix telemetry connection issue when disabling telemetry

- use ENABLE_OPEA_TELEMETRY to control whether to enable open telemetry, default false
- fix the issue that logs always show telemetry connection error with each request when telemetry is disabled
- ban the above error propagation to microservices when telemetry is disabled

Signed-off-by: Spycsh <sihan.chen@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix ut failure where required the flag to be on

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Spycsh <sihan.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…opea-project#1298)

* Refine CLI embedding microservice using dependency
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
1. set default session_id for react_langchain strategy, because the langchain version upgrade.
2. fix request message format
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: Letong Han <106566639+letonghan@users.noreply.github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Liang Lv <liang1.lv@intel.com>
* Refine dataprep Milvus CI
Signed-off-by: letonghan <letong.han@intel.com>
Signed-off-by: madison-evans <madison.evans@intel.com>
Signed-off-by: madison-evans <madison.evans@intel.com>
…controller on runtime.

Signed-off-by: madison-evans <madison.evans@intel.com>
Signed-off-by: madison-evans <madison.evans@intel.com>
Signed-off-by: madison-evans <madison.evans@intel.com>
This was referenced May 19, 2025
@joshuayao joshuayao removed a link to an issue May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.