Create token metrics only when they are available#1092
Create token metrics only when they are available#1092mkbhanda merged 3 commits intoopea-project:mainfrom
Conversation
Codecov ReportAttention: Patch coverage is
|
b059539 to
d36281e
Compare
|
Rebased to |
Why will this happen? Metrics are only updated when calling |
@Spycsh Because Prometheus client will start providing metrics after they've been created. In current code, all metrics are created when Orchestrator/OrhestratorMetrics is instantiated: https://github.com/opea-project/GenAIComps/blob/main/comps/cores/mega/orchestrator.py#L33
Those methods only update the value of the metric, they do not create them. This PR changes Histogram metric creation to be delayed until first call of the update methods. |
|
I dropped pending metric doc update & rebased to main. I'll have it in separate PR where I fix additional issues I noticed, which require pending requests metric type / name change. |
|
Has |
Could not find any good fix for it, so I just filed a ticket on it: #1121 |
OK so what you mean it that the dummy metrics will show zeros after initialization and before the first request and users should not see wrong values of request number... But you think the k8s will scrape the metrics even there are no requests and it is resource-consuming so you decide to delay the initialization only when there are requests. I agree with this approach. |
dataprep microservice itself should not generate |
Technically the zero counts are not wrong, but presence of token / LLM metrics is misleading for services that will never generate tokens (or use LLM). That's the main reason for this PR.
Visibility All OPEA originated services use HttpService i.e. provide HTTP access metrics [1]. To see those, Perf I doubt skipping generation of extra metrics has any noticeable perf impact on the service providing the metrics (currently Each Prometheus Histogram type provides about dozen different metrics, and in larger clusters, amount of metrics needs to be reduced to keep telemetry stack resource usage & perf reasonable. Telemetry stack resource usage should be significant concern only when there's larger number of such pods though. [1] There's large number of HTTP metrics, and some Python ones too. It would be good to have controls for limiting those in larger clusters, but I did not see any options for that in |
|
@Spycsh from you comment in the bug #1121 (comment) I realized that changing the method on first metric access is racy. It's possible that multiple threads end up in create method, before that method is changed to update one. Meaning that multiple identical metrics would be created, and Prometheus would barf on that. => I'll add lock & check to handle that. |
23cd2c5 to
0a4e313
Compare
This avoids generation of useless token/request histogram metrics for services that use Orchestrator class, but never call its token processing functionality. (Helps in differentiating frontend megaservice metrics from backend megaservice ones, especially when multiple OPEA applications run in the same cluster.) Also change Orchestrator CI test workaround to use unique prefix for each metric instance, instead of metrics being (singleton) class variables. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
As that that could be called from multiple request handling threads. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
0a4e313 to
39cd93a
Compare
|
Rebased by @Spycsh Does this look fine / OK to merge, now that v1.2 got branched & tagged? |
|
@eero-t does it make sense to have Guage for every service or just the end to end application, as in megaservice chatQnA? Was just curious why we did not have some default absolute value for histogram buckets for first token latency given it is like an SLA. I did see some settings in GenAIEval (https://github.com/opea-project/GenAIEval/blob/55174246234fa458afeca138e759d91648d28d03/evals/benchmark/grafana/vllm_grafana.json and https://github.com/opea-project/GenAIInfra/blob/5b2cca97206e6d27e7ea31e6a38e38dc21eec404/kubernetes-addons/Observability/chatqna/dashboard/tgi_grafana.json) and get its really data collection first and only display determined in Grafana. |
|
LGTM. |
Short answer: There's no way to differentiate whether service instantiating orchestrator class is frontend, or middle service, so orchestrator metrics appear to whichever service instantiates its class. This is the problem this PR tries deals with. Long answer: Every OPEA service does provide metrics, at least HTTP query stats, but those include also health and metric queries (regularly polled from the services by other k8s components), and do not include all relevant info for service SLAs. Metrics this PR is concerned, are end-user request and token latencies, relevant for tracking service SLAs, as measured within the service orchestrator class (only place where they actually can be measured). The problem that this PR deals with, is that service orchestrator is/was/can be used also OPEA components that do not process tokens, so providing (zero valued) metrics about them would be misleading (especially if those services then show up unwanted in dashboards). It also makes megaservice metric behaviour similar to (TEI/TGI/vLLM) inferencing services, which do not provide metrics until they've processed their first request.
How well given buckets fit given service, depends completely on what kind of LLM model/params are used, and whether inferencing is accelerated, how many backend there are, and also to some extent how stressed the service is. Prometheus already has defaults for histogram buckets, which are exponential. They are good enough that you'll see metrics being spread to multiple buckets, regardless of used model/acceleration. If one would want more details, buckets would need to be specified separately for each different service and its underlying HW configuration. I.e. they would need to be externally configurable, and specified in some kind of service/HW profile.
Values given to Prometheus quantile queries are percentage thresholds, not the histogram ( How accurate the quantile information is depends both on how well the values are spread to buckets, and how well that matches quantile percentage thresholds though. |
- Fix the wrong _instance_id handling in opea-project#1092 - essential for opea-project/GenAIExamples#1528 UT pass Signed-off-by: Spycsh <sihan.chen@intel.com>
commit ad8f517 Author: Dina Suehiro Jones <dina.s.jones@intel.com> Date: Wed Feb 26 11:35:04 2025 -0800 Dataprep Multimodal Redis README fixes (opea-project#1330) Signed-off-by: Dina Suehiro Jones <dina.s.jones@intel.com> commit c70f868 Author: Ervin Castelino <89144265+Ervin0307@users.noreply.github.com> Date: Tue Feb 25 07:42:42 2025 +0000 Update README.md (opea-project#1253) Signed-off-by: Ervin <ervincas13@gmail.com> commit 589587a Author: ZePan110 <ze.pan@intel.com> Date: Mon Feb 24 17:54:51 2025 +0800 Fix docker image security issues (opea-project#1321) Signed-off-by: ZePan110 <ze.pan@intel.com> commit f5699e4 Author: Brijesh Thummar <brijeshthummar02@gmail.com> Date: Mon Feb 24 12:00:19 2025 +0530 [Doc] vLLM - typo in README.md (opea-project#1302) Fix Typo in README Signed-off-by: brijeshthummar02@gmail.com <brijeshthummar02@gmail.com> commit 364ccad Author: Jonathan Minkin <minkinj@amazon.com> Date: Sun Feb 23 19:27:31 2025 -0800 Add support for string message in Bedrock textgen (opea-project#1291) * Add support for string message in bedrock, update README * Add test for string message in test script Signed-off-by: Jonathan Minkin <minkinj@amazon.com> commit 625aec9 Author: Daniel De León <111013930+daniel-de-leon-user293@users.noreply.github.com> Date: Fri Feb 21 13:20:58 2025 -0800 Add native support for toxicity detection guardrail microservice (opea-project#1258) * add opea native support for toxic-prompt-roberta * add test script back * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comp name env variable * set default port to 9090 Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * add service to compose Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * removed debug print Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * remove triton version because habana updated Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * add locust results Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * skip warmup for halluc test Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> --------- Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Liang Lv <liang1.lv@intel.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> commit 4352636 Author: ZePan110 <ze.pan@intel.com> Date: Fri Feb 21 17:14:08 2025 +0800 Fix trivy issue in Dockerfile (opea-project#1304) Signed-off-by: ZePan110 <ze.pan@intel.com> commit 135ef91 Author: rbrugaro <rita.brugarolas.brufau@intel.com> Date: Thu Feb 20 15:29:39 2025 -0800 Change neo4j Bolt default PORT from 7687 to $NEO4J_PORT2 (opea-project#1292) * Change neo4j Bolt default PORT from 7687 to -configured the port in neo4j compose.yaml to use variable value -made all corresponding changes in neo4j dataprep and retriever components and test scripts to use env variable instead of default port value. Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missing positional arg in milvus dataprep Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> * remove redundance in stop_docker Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> * resolve retriever to neo4j connectivity issue bad URL Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set neo4j ports to neo4j defaults and fix environment variables in READMEs Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> --------- Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Liang Lv <liang1.lv@intel.com> commit a4f6af1 Author: Letong Han <106566639+letonghan@users.noreply.github.com> Date: Thu Feb 20 13:38:01 2025 +0800 Refine dataprep test scripts (opea-project#1305) * Refine dataprep Milvus CI Signed-off-by: letonghan <letong.han@intel.com> commit 2102a8e Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu Feb 20 10:46:19 2025 +0800 Bump transformers (opea-project#1278) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Liang Lv <liang1.lv@intel.com> commit a033c05 Author: Liang Lv <liang1.lv@intel.com> Date: Wed Feb 19 14:19:02 2025 +0800 Fix milvus dataprep ingest files failure (opea-project#1299) Signed-off-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: Letong Han <106566639+letonghan@users.noreply.github.com> commit 022d052 Author: lkk <33276950+lkk12014402@users.noreply.github.com> Date: Wed Feb 19 09:50:59 2025 +0800 fix agent message format. (opea-project#1297) 1. set default session_id for react_langchain strategy, because the langchain version upgrade. 2. fix request message format commit 7727235 Author: Liang Lv <liang1.lv@intel.com> Date: Tue Feb 18 20:55:20 2025 +0800 Refine CLIP embedding microservice by leveraging the third-party CLIP (opea-project#1298) * Refine CLI embedding microservice using dependency Signed-off-by: lvliang-intel <liang1.lv@intel.com> commit a353f99 Author: Spycsh <39623753+Spycsh@users.noreply.github.com> Date: Mon Feb 17 11:35:38 2025 +0800 Fix telemetry connection issue when disabling telemetry (opea-project#1290) * Fix telemetry connection issue when disabling telemetry - use ENABLE_OPEA_TELEMETRY to control whether to enable open telemetry, default false - fix the issue that logs always show telemetry connection error with each request when telemetry is disabled - ban the above error propagation to microservices when telemetry is disabled Signed-off-by: Spycsh <sihan.chen@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix ut failure where required the flag to be on * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Spycsh <sihan.chen@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 7c2e7f6 Author: xiguiw <111278656+xiguiw@users.noreply.github.com> Date: Sat Feb 15 11:25:15 2025 +0800 update vLLM CPU to latest tag (opea-project#1285) Get the latest vLLM stable version. Signed-off-by: Wang, Xigui <xigui.wang@intel.com> commit c3c8497 Author: Letong Han <106566639+letonghan@users.noreply.github.com> Date: Fri Feb 14 22:29:38 2025 +0800 Fix Qdrant retriever RAG issue. (opea-project#1289) * Fix Qdrant retriever no retrieved result issue. Signed-off-by: letonghan <letong.han@intel.com> commit 47f68a4 Author: Letong Han <106566639+letonghan@users.noreply.github.com> Date: Fri Feb 14 20:29:27 2025 +0800 Fix the retriever issue of Milvus (opea-project#1286) * Fix the retriever issue of Milvus DB that data can not be retrieved after ingested using dataprep. Signed-off-by: letonghan <letong.han@intel.com> --------- Signed-off-by: letonghan <letong.han@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 0e3f8ab Author: minmin-intel <minmin.hou@intel.com> Date: Thu Feb 13 20:24:02 2025 -0800 Improve multi-turn capability for agent (opea-project#1248) * first code for multi-turn Signed-off-by: minmin-intel <minmin.hou@intel.com> * test redispersistence Signed-off-by: minmin-intel <minmin.hou@intel.com> * integrate persistent store in react llama Signed-off-by: minmin-intel <minmin.hou@intel.com> * test multi-turn Signed-off-by: minmin-intel <minmin.hou@intel.com> * multiturn for assistants api and chatcompletion api Signed-off-by: minmin-intel <minmin.hou@intel.com> * update readme and ut script Signed-off-by: minmin-intel <minmin.hou@intel.com> * update readme and ut scripts Signed-off-by: minmin-intel <minmin.hou@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug Signed-off-by: minmin-intel <minmin.hou@intel.com> * change memory type naming Signed-off-by: minmin-intel <minmin.hou@intel.com> * fix with_memory as str Signed-off-by: minmin-intel <minmin.hou@intel.com> --------- Signed-off-by: minmin-intel <minmin.hou@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 4a90692 Author: rbrugaro <rita.brugarolas.brufau@intel.com> Date: Thu Feb 13 18:12:25 2025 -0800 Bug Fix neo4j dataprep ingest error handling and skip_ingestion argument passing (opea-project#1288) * Fix dataprpe ingest error handling and skip_ingestion argument passing in dataprep neo4j integration Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> commit d1dfd0e Author: Spycsh <39623753+Spycsh@users.noreply.github.com> Date: Thu Feb 13 22:39:47 2025 +0800 Align mongo related chathistory/feedbackmanagement/promptregistry image names with examples (opea-project#1284) Align mongo related chathistory/feedbackmanagement/promptregistry image names with examples Signed-off-by: Spycsh <sihan.chen@intel.com> Co-authored-by: Liang Lv <liang1.lv@intel.com> commit bef501c Author: Liang Lv <liang1.lv@intel.com> Date: Thu Feb 13 21:18:58 2025 +0800 Fix VDMS retrieval issue (opea-project#1252) * Fix VDMS retrieval issue Signed-off-by: lvliang-intel <liang1.lv@intel.com> commit 23b2be2 Author: ZePan110 <ze.pan@intel.com> Date: Thu Feb 13 16:07:14 2025 +0800 Fix Build latest images on push event workflow (opea-project#1282) Signed-off-by: ZePan110 <ze.pan@intel.com> commit f8e6216 Author: Spycsh <39623753+Spycsh@users.noreply.github.com> Date: Wed Feb 12 15:45:14 2025 +0800 fix metric id issue when init multiple Orchestrator instance (opea-project#1280) Signed-off-by: Spycsh <sihan.chen@intel.com> commit d3906ce Author: chen, suyue <suyue.chen@intel.com> Date: Wed Feb 12 14:56:55 2025 +0800 update default service list (opea-project#1276) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 17b9672 Author: XinyaoWa <xinyao.wang@intel.com> Date: Wed Feb 12 13:53:31 2025 +0800 Fix langchain and huggingface version to avoid bug in FaqGen and DocSum, remove vllm hpu triton version fix (opea-project#1275) * Fix langchain and huggingface version to avoid bug Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> commit b777db7 Author: Letong Han <106566639+letonghan@users.noreply.github.com> Date: Mon Feb 10 16:00:55 2025 +0800 Fix Dataprep Ingest Data Issue. (opea-project#1271) * Fix Dataprep Ingest Data Issue. Trace: 1. The update of `langchain_huggingface.HuggingFaceEndpointEmbeddings` caused the wrong size of embedding vectors. 2. Wrong size vectors are wrongly saved into Redis database, and the indices are not created correctly. 3. The retriever can not retrieve data from Redis using index due to the reasons above. 4. Then the RAG seems `not work`, for the file uploaded can not be retrieved from database. Solution: Replace all of the `langchain_huggingface.HuggingFaceEndpointEmbeddings` to `langchain_community.embeddings.HuggingFaceInferenceAPIEmbeddings`, and modify related READMEs and scirpts. Related issue: - opea-project/GenAIExamples#1473 - opea-project/GenAIExamples#1482 --------- Signed-off-by: letonghan <letong.han@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 0df374b Author: Daniel De León <111013930+daniel-de-leon-user293@users.noreply.github.com> Date: Sun Feb 9 22:01:58 2025 -0800 Update docs for LLamaGuard & WildGuard Microservice (opea-project#1259) * working README for CLI and compose Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * update for direct python execution Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * fix formatting Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bring back depends_on condition Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> --------- Signed-off-by: Daniel Deleon <daniel.de.leon@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> commit fb86b5e Author: Louie Tsai <louie.tsai@intel.com> Date: Sat Feb 8 00:58:33 2025 -0800 Add Deepseek model into validated model table and add required Gaudi cards for LLM microservice (opea-project#1267) * Update README.md for Deepseek support and numbers of required gaudi cards Signed-off-by: Tsai, Louie <louie.tsai@intel.com> * Update README.md Signed-off-by: Tsai, Louie <louie.tsai@intel.com> --------- Signed-off-by: Tsai, Louie <louie.tsai@intel.com> commit ecb7f7b Author: Spycsh <39623753+Spycsh@users.noreply.github.com> Date: Fri Feb 7 16:58:22 2025 +0800 Fix web-retrievers hub client and tei endpoint issue (opea-project#1270) * fix web-retrievers hub client and tei endpoint issue Signed-off-by: Spycsh <sihan.chen@intel.com> commit 5baada8 Author: ZePan110 <ze.pan@intel.com> Date: Thu Feb 6 15:03:00 2025 +0800 Fix CD test issue. (opea-project#1263) 1.Fix template name in README 2.Fix invalid release name Signed-off-by: ZePan110 <ze.pan@intel.com> commit fa01f46 Author: minmin-intel <minmin.hou@intel.com> Date: Wed Feb 5 13:57:57 2025 -0800 fix tei embedding and tei reranking bug (opea-project#1256) Signed-off-by: minmin-intel <minmin.hou@intel.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com> commit 4ede405 Author: Eero Tamminen <eero.t.tamminen@intel.com> Date: Wed Feb 5 22:04:50 2025 +0200 Create token metrics only when they are available (opea-project#1092) * Create token metrics only when they are available This avoids generation of useless token/request histogram metrics for services that use Orchestrator class, but never call its token processing functionality. (Helps in differentiating frontend megaservice metrics from backend megaservice ones, especially when multiple OPEA applications run in the same cluster.) Also change Orchestrator CI test workaround to use unique prefix for each metric instance, instead of metrics being (singleton) class variables. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * Add locking for latency metric creation / method change As that that could be called from multiple request handling threads. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> --------- Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>
* Create token metrics only when they are available This avoids generation of useless token/request histogram metrics for services that use Orchestrator class, but never call its token processing functionality. (Helps in differentiating frontend megaservice metrics from backend megaservice ones, especially when multiple OPEA applications run in the same cluster.) Also change Orchestrator CI test workaround to use unique prefix for each metric instance, instead of metrics being (singleton) class variables. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * Add locking for latency metric creation / method change As that that could be called from multiple request handling threads. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> --------- Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> Co-authored-by: Malini Bhandaru <malini.bhandaru@intel.com>
Description
This avoids generating useless token / request histogram metrics for services that use Orchestrator class, but never call its token processing functionality. Such dummy metrics can confuse telemetry users.
(It also helps in differentiating frontend megaservice metrics from backend megaservice ones, especially when multiple OPEA applications with wrapper microservices run in the same cluster.)
Issues
n/a.Type of change
Dependencies
n/a.Tests
Manual testing with latest versions, to verify that: