diff --git a/helm-charts/HPA.md b/helm-charts/HPA.md index ed81ae5d7..053eb62ff 100644 --- a/helm-charts/HPA.md +++ b/helm-charts/HPA.md @@ -26,7 +26,7 @@ Read [post-install](#post-install) steps before installation! ### Resource requests -HPA controlled CPU pods SHOULD have appropriate resource requests or affinity rules (enabled in their +HPA controlled _CPU_ pods SHOULD have appropriate resource requests or affinity rules (enabled in their subcharts and tested to work) so that k8s scheduler does not schedule too many of them on the same node(s). Otherwise they never reach ready state. @@ -79,7 +79,7 @@ Why HPA is opt-in: - Top level chart name needs to conform to Prometheus metric naming conventions, as it is also used as a metric name prefix (with dashes converted to underscores) - Unless pod resource requests, affinity rules, scheduling topology constraints and/or cluster NRI - policies are used to better isolate service inferencing pods from each other, instances + policies are used to better isolate _CPU_ inferencing pods from each other, service instances scaled up on same node may never get to ready state - Current HPA rules are just examples, for efficient scaling they need to be fine-tuned for given setup performance (underlying HW, used models and data types, OPEA version etc) @@ -94,8 +94,9 @@ ChatQnA includes pre-configured values files for scaling the services. To enable HPA, add `-f chatqna/hpa-values.yaml` option to your `helm install` command line. If **CPU** versions of TGI (and TEI) services are being scaled, resource requests and probe timings -suitable for CPU usage need to be used. Add `-f chatqna/cpu-values.yaml` option to your `helm install` -line. If you need to change model specified there, update the resource requests accordingly. +suitable for CPU usage need to be used. `chatqna/cpu-values.yaml` provides example of such constraints +which can be added (with `-f` option) to your Helm install. As those values depend on the underlying HW, +used model, data type and image versions, the specified resource values may need to be updated. ### Post-install diff --git a/helm-charts/monitoring.md b/helm-charts/monitoring.md index 506f310df..09c1ec37e 100644 --- a/helm-charts/monitoring.md +++ b/helm-charts/monitoring.md @@ -6,7 +6,6 @@ - [Pre-conditions](#pre-conditions) - [Prometheus install](#prometheus-install) - [Helm options](#helm-options) -- [Gotchas](#gotchas) - [Install](#install) - [Verify](#verify) @@ -17,6 +16,10 @@ which can be visualized e.g. in [Grafana](https://grafana.com/). Scaling the services automatically based on their usage with [HPA](HPA.md) also relies on these metrics. +[Observability documentation](../kubernetes-addons/Observability/README.md) +explains how to install additional monitoring for node and device metrics, +and Grafana for visualizing those metrics. + ## Pre-conditions ### Prometheus install @@ -42,12 +45,6 @@ provide that as `global.prometheusRelease` value for the OPEA service Helm insta or in its `values.yaml` file. Otherwise Prometheus ignores the installed `serviceMonitor` objects. -## Gotchas - -By default Prometheus adds [k8s RBAC rules](https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/prometheus-roleBindingSpecificNamespaces.yaml) -for detecting `serviceMonitor`s and querying metrics from `default`, `kube-system` and `monitoring` namespaces. -If Helm is asked to install OPEA service to some other namespace, those rules need to be updated accordingly. - ## Install Install Helm chart with `global.monitoring:true` option. diff --git a/kubernetes-addons/Observability/README.md b/kubernetes-addons/Observability/README.md index d6ce0f941..199e2c4bd 100644 --- a/kubernetes-addons/Observability/README.md +++ b/kubernetes-addons/Observability/README.md @@ -1,6 +1,8 @@ # How-To Setup Observability for OPEA Workload in Kubernetes -This guide provides a step-by-step approach to setting up observability for the OPEA workload in a Kubernetes environment. We will cover the setup of Prometheus and Grafana, as well as the collection of metrics for Gaudi hardware, OPEA/chatqna including TGI,TEI-Embedding,TEI-Reranking and other microservies, and PCM. +This guide provides a step-by-step approach to setting up observability for the OPEA workload in a Kubernetes environment. We will cover the setup of Prometheus and Grafana, as well as the collection of metrics for Gaudi hardware, OPEA/chatqna including TGI, TEI-Embedding, TEI-Reranking and other microservices, and PCM. + +For monitoring Helm installed OPEA applications, see [Helm monitoring option](../../helm-charts/monitoring.md). ## Prepare