Closed
Conversation
Signed-off-by: Ruoyu Ying <ruoyu.ying@intel.com> Co-authored-by: sdp <sdp@b49691d6a5d8.jf.intel.com>
* Refine clip embedding Signed-off-by: lvliang-intel <liang1.lv@intel.com>
… licenses (opea-project#1247) Signed-off-by: Patil, Jitendra <jitendra.patil@intel.com>
* Fix bug iin HuggingFaceEndpoint usage 1. Upgrade langchain hugginface from community to partner (community deprecated) Added task=text-generation argument to fix error with tgi_endpoint Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
Signed-off-by: dmsuehir <dina.s.jones@intel.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
* Create token metrics only when they are available This avoids generation of useless token/request histogram metrics for services that use Orchestrator class, but never call its token processing functionality. (Helps in differentiating frontend megaservice metrics from backend megaservice ones, especially when multiple OPEA applications run in the same cluster.) Also change Orchestrator CI test workaround to use unique prefix for each metric instance, instead of metrics being (singleton) class variables. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * Add locking for latency metric creation / method change As that that could be called from multiple request handling threads. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> --------- Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>
Signed-off-by: minmin-intel <minmin.hou@intel.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
1.Fix template name in README 2.Fix invalid release name Signed-off-by: ZePan110 <ze.pan@intel.com>
* fix web-retrievers hub client and tei endpoint issue Signed-off-by: Spycsh <sihan.chen@intel.com>
…cards for LLM microservice (opea-project#1267) * Update README.md for Deepseek support and numbers of required gaudi cards Signed-off-by: Tsai, Louie <louie.tsai@intel.com> * Update README.md Signed-off-by: Tsai, Louie <louie.tsai@intel.com> --------- Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
* working README for CLI and compose Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * update for direct python execution Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * fix formatting Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bring back depends_on condition Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> --------- Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
* Fix Dataprep Ingest Data Issue. Trace: 1. The update of `langchain_huggingface.HuggingFaceEndpointEmbeddings` caused the wrong size of embedding vectors. 2. Wrong size vectors are wrongly saved into Redis database, and the indices are not created correctly. 3. The retriever can not retrieve data from Redis using index due to the reasons above. 4. Then the RAG seems `not work`, for the file uploaded can not be retrieved from database. Solution: Replace all of the `langchain_huggingface.HuggingFaceEndpointEmbeddings` to `langchain_community.embeddings.HuggingFaceInferenceAPIEmbeddings`, and modify related READMEs and scirpts. Related issue: - opea-project/GenAIExamples#1473 - opea-project/GenAIExamples#1482 --------- Signed-off-by: letonghan <letong.han@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…um, remove vllm hpu triton version fix (opea-project#1275) * Fix langchain and huggingface version to avoid bug Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
…oject#1280) Signed-off-by: Spycsh <sihan.chen@intel.com>
Signed-off-by: ZePan110 <ze.pan@intel.com>
* Fix VDMS retrieval issue Signed-off-by: lvliang-intel <liang1.lv@intel.com>
…ge names with examples (opea-project#1284) Align mongo related chathistory/feedbackmanagement/promptregistry image names with examples Signed-off-by: Spycsh <sihan.chen@intel.com> Co-authored-by: Liang Lv <liang1.lv@intel.com>
…ent passing (opea-project#1288) * Fix dataprpe ingest error handling and skip_ingestion argument passing in dataprep neo4j integration Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com>
* first code for multi-turn Signed-off-by: minmin-intel <minmin.hou@intel.com> * test redispersistence Signed-off-by: minmin-intel <minmin.hou@intel.com> * integrate persistent store in react llama Signed-off-by: minmin-intel <minmin.hou@intel.com> * test multi-turn Signed-off-by: minmin-intel <minmin.hou@intel.com> * multiturn for assistants api and chatcompletion api Signed-off-by: minmin-intel <minmin.hou@intel.com> * update readme and ut script Signed-off-by: minmin-intel <minmin.hou@intel.com> * update readme and ut scripts Signed-off-by: minmin-intel <minmin.hou@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug Signed-off-by: minmin-intel <minmin.hou@intel.com> * change memory type naming Signed-off-by: minmin-intel <minmin.hou@intel.com> * fix with_memory as str Signed-off-by: minmin-intel <minmin.hou@intel.com> --------- Signed-off-by: minmin-intel <minmin.hou@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix the retriever issue of Milvus DB that data can not be retrieved after ingested using dataprep. Signed-off-by: letonghan <letong.han@intel.com> --------- Signed-off-by: letonghan <letong.han@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix Qdrant retriever no retrieved result issue. Signed-off-by: letonghan <letong.han@intel.com>
Get the latest vLLM stable version. Signed-off-by: Wang, Xigui <xigui.wang@intel.com>
…#1290) * Fix telemetry connection issue when disabling telemetry - use ENABLE_OPEA_TELEMETRY to control whether to enable open telemetry, default false - fix the issue that logs always show telemetry connection error with each request when telemetry is disabled - ban the above error propagation to microservices when telemetry is disabled Signed-off-by: Spycsh <sihan.chen@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix ut failure where required the flag to be on * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Spycsh <sihan.chen@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…opea-project#1298) * Refine CLI embedding microservice using dependency Signed-off-by: lvliang-intel <liang1.lv@intel.com>
1. set default session_id for react_langchain strategy, because the langchain version upgrade. 2. fix request message format
Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: Letong Han <106566639+letonghan@users.noreply.github.com>
Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Liang Lv <liang1.lv@intel.com>
* Refine dataprep Milvus CI Signed-off-by: letonghan <letong.han@intel.com>
… place of local config files
… 'semantic_router' module
Signed-off-by: madison-evans <madison.evans@intel.com>
Signed-off-by: madison-evans <madison.evans@intel.com>
…controller on runtime. Signed-off-by: madison-evans <madison.evans@intel.com>
Signed-off-by: madison-evans <madison.evans@intel.com>
Signed-off-by: madison-evans <madison.evans@intel.com>
…ntroller. Now switchable
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds a new modular component (router) to the GenAIComps infrastructure. The router is designed to direct prompts to different downstream LLM endpoints based on prompt complexity or semantic characteristics.
The router supports multiple controller instances:
RouteLLM: a matrix factorization-based router trained on preference-annotated datasets (e.g., gpt4_judge_battles)
Semantic Router: an embedding similarity-based router for simple threshold-based prompt classification
Configuration is centralized via config.yaml and mounted per controller. The router is deployable via both Docker Compose and Kubernetes.
Issues
n/a — this is a new component addition
Type of change
New feature (non-breaking change which adds new functionality)
Others (enhancement, validation, modularity)
Dependencies
Adds controller-specific config YAMLs: routellm_config.yaml, semantic_router_config.yaml
Uses existing base dependencies (pydantic, fastapi, etc.) already supported in the project
Optionally depends on access to HuggingFace embeddings and OpenAI APIs, via secrets