kubeai: document usage of NRI plugins for performance optimization#1113
kubeai: document usage of NRI plugins for performance optimization#1113eero-t merged 1 commit intoopea-project:mainfrom
Conversation
090725a to
c3dd144
Compare
eero-t
left a comment
There was a problem hiding this comment.
README is getting quite long now, so I wonder would this make sense as a separate document (linked from README), as it's more about configuring the cluster, than configuring KubeAI?
c6972e0 to
54dcb9d
Compare
|
Updated:
EDIT: added TOC |
54dcb9d to
412da85
Compare
Let's keep it in single file. There is room to trim the README to make it more compact. |
I was back-and-forth on this myself. Originally I had a separate document but the decided to put this in the README. |
eero-t
left a comment
There was a problem hiding this comment.
Approved, but I still have few suggestions to slightly improve the text.
412da85 to
3d75552
Compare
Document a preferred setup of the Balloons Policy from the NRI Plugins project. Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
3d75552 to
c197914
Compare
|
@marquiz, I needed to change to when otherwise label match was not working, and the policy resolved vllm container to be put in the default balloon type. Logs looked like this: When prefixed with "pods/", the expression resulted in: in the log, and NRT looked as expected: kubectl get noderesourcetopologies.topology.node.k8s.io -o yaml I think this is a usability issue in our expression evaluation. It would be reasonable to expect "labels/" automatically match pod labels. What do you think? |
Interestingly, the matchExpression worksforme for the kubeai workload 🤔 I got (in NRT) - attributes:
- name: cpuset
value: "3"
- name: memory set
value: ""
name: kubeai/model-qwen2.5-0.5b-cpu-5549fbccc5-cjq2f/server
parent: kubeai-inference[0]In the logs I see ] allocating resources for container kubeai/model-qwen2.5-0.5b-cpu-5549fbccc5-cjq2f/server (request 1000 mCPU, limit 0 mCPU)...
I0611 15:47:52.906534 1 log.go:476] D: [ policy ] choosing balloon type for container kubeai/model-qwen2.5-0.5b-cpu-5549fbccc5-cjq2f/server...
I0611 15:47:52.906543 1 log.go:476] D: [ policy ] - checking expression <labels/app.kubernetes.io/name In vllm,ollama> of balloon type "kubeai-inference" against container kubeai/model-qwen2.5-0.5b-cpu-5549fbccc5-cjq2f/server...
I0611 15:47:52.906551 1 log.go:476] D: [ policy ] - checking expression <name In server> of balloon type "kubeai-inference" against container kubeai/model-qwen2.5-0.5b-cpu-5549fbccc5-cjq2f/server...
I0611 15:47:52.906553 1 log.go:476] D: [ policy ] => matchesSo it put the container in the balloon even though all matchExpressions did not match. I think that's a bug (or unwanted feature at least). I'd assume the matchExpression works similarly to Kubernetes where all expressions are ANDed. WDYT? |
Not so sure about this as there are also container labels (in CRI/NRI) level. I'd leave that as is and possibly only note that caveat in the documentation. |
See #1115 |
Document a preferred setup of the Balloons Policy from the NRI Plugins project.