Add E2E Promeheus metrics to applications#845
Conversation
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
748b6fa to
64eca15
Compare
|
Regarding the duplicate Creating multiple While => I think Note: PS. |
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Unlike apps, CI tests create multiple of them. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
|
Rebased pre-commit changes to earlier commits, and pushed above described solution to the CI issue on enabling I'm currently testing whether I could get somewhat similar metric (reliably!) also from If that works, enabling the "inprogress" metrics for EDIT: And on further grepping tests seem to be testing on the unwanted |
Creating multiple MicroService()s creates multiple HTTPService()s which creates multiple Prometheus fastapi instrumentor instances. While latter handled that fine for ChatQnA and normal HTTP metrics, that was not the case for its "inprogress" metrics in CI. Therefore MicroService constructor name argument is now mandatory, so that it can be used to make "inprogress" metrics for HTTPService instances unique. PS. instrumentor requires HTTPService instance specific Starlette instance, so it cannot be made singleton. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
|
|
LGTM |
|
@Spycsh, @lvliang-intel Any suggestions where the new metrics should be documented; in GenAIExamples, or GenAIInfra repo? Or is it enough to add add Prometheus serviceMonitors to Helm chats for (rest of) the OPEA applications, and some Grafana dashboards for them? |
|
Hi @eero-t , GenAIEval https://github.com/opea-project/GenAIEval/tree/main/evals/benchmark/grafana does track some Prometheus metrics and provide the naive measurement of first token latency and avg token latency, which are on the client side instead of through Prometheus. Welcome to add some documents there in the future. |
Eval repo is for evaluating and benchmarking, whereas metrics provided by the service "frontend", are (also) for operational monitoring, normal, every day usage of the service. I think most appropriate place would be the Infra repo, as it already includes monitoring support both with Helm charts [1], and separate manifest files + couple of Grafana dashboards [2], but that's rather Kubernetes specific. [1] https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/monitoring.md |
Sure. Thanks for pointing out. |
* Fix typos in BaseStatistics method names Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * Add HttpService "inprogress" (pending) request count metrics Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * Add E2E Prometheus metrics to ServiceOrchestrator Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * Fix: support metrics with multiple ServiceOrchestrator instances Unlike apps, CI tests create multiple of them. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * Fix: require named MicroService -> HTTPService instances Creating multiple MicroService()s creates multiple HTTPService()s which creates multiple Prometheus fastapi instrumentor instances. While latter handled that fine for ChatQnA and normal HTTP metrics, that was not the case for its "inprogress" metrics in CI. Therefore MicroService constructor name argument is now mandatory, so that it can be used to make "inprogress" metrics for HTTPService instances unique. PS. instrumentor requires HTTPService instance specific Starlette instance, so it cannot be made singleton. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * Fix: update test_token_generator() Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Description
PR does following changes:
Issues
opea-project/GenAIExamples#391
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
No new ones.
(
prometheus_fastapi_instrumentatorimported forHttpServicealready importedprometheus_clientmodule to apps.)Tests
Verified manually that produced metrics match ones from a benchmark that stresses the ChatQnA application.
Potential future changes (other PRs)
*_createdmetrics (prometheus_client.disable_created_metrics())?ServiceOrchestratorobject and all applications and tests creating them to provide unique name for the orchestrator instance, and use that as metric prefix. Instead of all orchestrator instances sharing the same set ofmegaservice_prefixed singleton metrics...