Add Intel GPU (Habana Gaudi) autoscaler support#8853
Add Intel GPU (Habana Gaudi) autoscaler support#8853k8s-ci-robot merged 2 commits intokubernetes:masterfrom
Conversation
|
Welcome @DorWeinstock! |
|
Hi @DorWeinstock. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
elmiko
left a comment
There was a problem hiding this comment.
this makes sense to me, and i think it's a good upgrade. i have a couple questions about using the gpu package a little more.
Add support for Intel Habana Gaudi GPUs in the cluster autoscaler by:
- Define ResourceIntelGPU resource name (habana.ai/gaudi)
- Add Intel GPU to GPUVendorResourceNames list
- Refactor GPU detection logic to iterate through all GPU vendor resource names
instead of checking vendors individually
This enables the autoscaler to properly detect and handle Intel GPU nodes
alongside existing NVIDIA, AMD, and DirectX GPU support.
486d3b7 to
5873c7f
Compare
Extract the GPU allocatable detection loop into a new NodeHasGpuAllocatable helper function in utils/gpu/gpu.go. This eliminates code duplication across gpu_processor.go and makes the logic more maintainable. The new function returns both the GPU allocatable value and whether it exists, allowing callers to get both pieces of information in a single call. Changes: - Add NodeHasGpuAllocatable() helper in utils/gpu/gpu.go - Update NodeHasGpu() to use the new helper - Simplify FilterOutNodesWithUnreadyResources() in gpu_processor.go - Simplify GetNodeGpuTarget() in gpu_processor.go
|
|
|
@yansun1996 do the non-Intel changes in this PR address your desired changes in #8865 ? |
yep @jackfrancis this PR is doing the same changes compared to #8865 |
|
/test pull-cluster-autoscaler-e2e-azure-master |
|
/ok-to-test |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DorWeinstock, jackfrancis The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherry-pick cluster-autoscaler-release-1.34 |
|
@jackfrancis: #8853 failed to apply on top of branch "cluster-autoscaler-release-1.34": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Add support for Intel Habana Gaudi GPUs in the cluster autoscaler by:
This enables the autoscaler to properly detect and handle Intel GPU nodes alongside existing NVIDIA, AMD, and DirectX GPU support.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Changes has been tested against IBM cloud provider.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: