diff --git a/kubeai/README.md b/kubeai/README.md index 13ead7d69..a007e33eb 100644 --- a/kubeai/README.md +++ b/kubeai/README.md @@ -14,6 +14,10 @@ For now, OPEA enables a subset of the KubeAI features. In the future more KubeAI - [Text Generation with Llama-3 on Gaudi](#text-generation-with-llama-3-on-gaudi) - [Text Embeddings with BGE on CPU](#text-embeddings-with-bge-on-cpu) - [Using the Models](#using-the-models) +- [CPU Performance Optimization with NRI](#cpu-performance-optimization-with-nri) + - [Overview](#overview) + - [Installation of Balloons Policy Plugin](#installation-of-balloons-policy-plugin) + - [Configuration of Balloons Policy Plugin](#configuration-of-balloons-policy-plugin) - [Observability](#observability) ## Features @@ -181,6 +185,119 @@ curl "http://localhost:8000/openai/v1/chat/completions" \ Enjoy the answer! +# CPU Performance Optimization with NRI + +## Overview + +[NRI plugins][nri-plugins] provide a way to +optimize the resource placement of applications in a Kubernetes cluster. They +connect to the container runtime and are able, for example, to adjust the CPU +and memory pinning of containers. + +This section provides a guide on how to use the +[Balloons Policy][balloons-policy] plugin from the [NRI Plugins][nri-plugins] +project to optimize the performance of CPU-backed KubeAI profiles. + +## Installation of Balloons Policy Plugin + +> **NOTE:** To avoid disturbing already running workloads it is recommended to +> install the NRI plugin to an empty node (do it right after node bootstrap, or +> drain the node before installation). + +Install the balloons policy plugin with Helm: + +```bash +helm repo add nri-plugins https://containers.github.io/nri-plugins +helm repo update nri-plugins +helm install -n kube-system balloons nri-plugins/nri-resource-policy-balloons +``` + +> **NOTE**: With containerd version earlier than v2.0 you need to enable +> the NRI support in the containerd configuration file. Instead of manual +> configuration you can provide `--set nri.runtime.patchConfig=true` to the Helm +> command above, which will automatically patch the containerd configuration +> file on each node. + +Verify that the balloons policy plugin is running on every node: + +```bash +$ kubectl -n kube-system get ds nri-resource-policy-balloons +NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE +nri-resource-policy-balloons 2 2 2 2 2 kubernetes.io/os=linux 77s +``` + +## Configuration of Balloons Policy Plugin + +The aim of the balloons policy configuration is to isolate the model (inference +engine) containers to minimize the impact of containers on each other. + +An example configuration for the current CPU-backed model profiles: + +```yaml +cat <