Add Intel GPU (Habana Gaudi) autoscaler support by DorWeinstock · Pull Request #8853 · kubernetes/autoscaler

DorWeinstock · 2025-11-24T13:38:08Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Add support for Intel Habana Gaudi GPUs in the cluster autoscaler by:

Define ResourceIntelGPU resource name (habana.ai/gaudi)
Add Intel GPU to GPUVendorResourceNames list
Refactor GPU detection logic to iterate through all GPU vendor resource names instead of checking vendors individually

This enables the autoscaler to properly detect and handle Intel GPU nodes alongside existing NVIDIA, AMD, and DirectX GPU support.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Changes has been tested against IBM cloud provider.

Does this PR introduce a user-facing change?

pods can now reuqest habana.ai/gaudi as a valid resource

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

linux-foundation-easycla · 2025-11-24T13:38:14Z

The committers listed above are authorized under a signed CLA.

✅ login: DorWeinstock / name: Dor Weinstock (5873c7f, cc49907)

k8s-ci-robot · 2025-11-24T13:38:17Z

Welcome @DorWeinstock!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-11-24T13:38:18Z

Hi @DorWeinstock. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

elmiko

this makes sense to me, and i think it's a good upgrade. i have a couple questions about using the gpu package a little more.

Add support for Intel Habana Gaudi GPUs in the cluster autoscaler by: - Define ResourceIntelGPU resource name (habana.ai/gaudi) - Add Intel GPU to GPUVendorResourceNames list - Refactor GPU detection logic to iterate through all GPU vendor resource names instead of checking vendors individually This enables the autoscaler to properly detect and handle Intel GPU nodes alongside existing NVIDIA, AMD, and DirectX GPU support.

Extract the GPU allocatable detection loop into a new NodeHasGpuAllocatable helper function in utils/gpu/gpu.go. This eliminates code duplication across gpu_processor.go and makes the logic more maintainable. The new function returns both the GPU allocatable value and whether it exists, allowing callers to get both pieces of information in a single call. Changes: - Add NodeHasGpuAllocatable() helper in utils/gpu/gpu.go - Update NodeHasGpu() to use the new helper - Simplify FilterOutNodesWithUnreadyResources() in gpu_processor.go - Simplify GetNodeGpuTarget() in gpu_processor.go

DorWeinstock · 2025-11-25T18:00:16Z

func NodeHasGpuAllocatable(node *apiv1.Node) (gpuAllocatableValue int64, hasGpuAllocatable bool) has been implemented and is now being called both in gpu.go and gpu_processor.go
@elmiko, @vadasambar please check now.

jackfrancis · 2025-11-26T22:03:16Z

@yansun1996 do the non-Intel changes in this PR address your desired changes in #8865 ?

yansun1996 · 2025-11-26T22:11:59Z

@yansun1996 do the non-Intel changes in this PR address your desired changes in #8865 ?

yep @jackfrancis this PR is doing the same changes compared to #8865

jackfrancis · 2025-11-26T22:15:54Z

/test pull-cluster-autoscaler-e2e-azure-master

jackfrancis · 2025-11-26T22:16:54Z

/ok-to-test

jackfrancis

/lgtm
/approve

k8s-ci-robot · 2025-11-26T23:22:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DorWeinstock, jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [jackfrancis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jackfrancis · 2026-01-12T19:27:05Z

/cherry-pick cluster-autoscaler-release-1.34

k8s-infra-cherrypick-robot · 2026-01-12T19:27:42Z

@jackfrancis: #8853 failed to apply on top of branch "cluster-autoscaler-release-1.34":

Applying: Add Intel GPU (Habana Gaudi) autoscaler support
Using index info to reconstruct a base tree...
M	cluster-autoscaler/processors/customresources/gpu_processor.go
Falling back to patching base and 3-way merge...
Auto-merging cluster-autoscaler/processors/customresources/gpu_processor.go
CONFLICT (content): Merge conflict in cluster-autoscaler/processors/customresources/gpu_processor.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 Add Intel GPU (Habana Gaudi) autoscaler support

Details

In response to this:

/cherry-pick cluster-autoscaler-release-1.34

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Nov 24, 2025

k8s-ci-robot added do-not-merge/needs-area Indicates that a PR should not merge because it lacks an area label. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. area/cluster-autoscaler Issues or PRs related to the Cluster Autoscaler component labels Nov 24, 2025

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 24, 2025

k8s-ci-robot removed the do-not-merge/needs-area Indicates that a PR should not merge because it lacks an area label. label Nov 24, 2025

k8s-ci-robot requested review from elmiko and vadasambar November 24, 2025 13:38

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 24, 2025

elmiko reviewed Nov 24, 2025

View reviewed changes

Comment thread cluster-autoscaler/processors/customresources/gpu_processor.go Outdated

Comment thread cluster-autoscaler/processors/customresources/gpu_processor.go Outdated

Comment thread cluster-autoscaler/processors/customresources/gpu_processor.go

DorWeinstock force-pushed the add-intel-gaudi-support branch from 486d3b7 to 5873c7f Compare November 25, 2025 17:42

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 25, 2025

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 25, 2025

jackfrancis mentioned this pull request Nov 26, 2025

[Feature] Make GPU utils function neutral for vendor's resource names #8629

Merged

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 26, 2025

jackfrancis approved these changes Nov 26, 2025

View reviewed changes

k8s-ci-robot assigned jackfrancis Nov 26, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 26, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 26, 2025

jackfrancis mentioned this pull request Nov 26, 2025

Make gpu_processor unready filter neutral to all GPU vendors #8865

Closed

k8s-ci-robot merged commit ffcbfee into kubernetes:master Nov 26, 2025
9 checks passed

jackfrancis mentioned this pull request Jan 12, 2026

[cluster-autoscaler-1.34] Add Intel GPU (Habana Gaudi) autoscaler support #9049

Merged

Conversation

DorWeinstock commented Nov 24, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

linux-foundation-easycla Bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Nov 24, 2025

Uh oh!

k8s-ci-robot commented Nov 24, 2025

Uh oh!

elmiko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DorWeinstock commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jackfrancis commented Nov 26, 2025

Uh oh!

yansun1996 commented Nov 26, 2025

Uh oh!

jackfrancis commented Nov 26, 2025

Uh oh!

jackfrancis commented Nov 26, 2025

Uh oh!

jackfrancis left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Nov 26, 2025

Uh oh!

Uh oh!

jackfrancis commented Jan 12, 2026

Uh oh!

k8s-infra-cherrypick-robot commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

linux-foundation-easycla Bot commented Nov 24, 2025 •

edited

Loading

DorWeinstock commented Nov 25, 2025 •

edited

Loading