AEP-8818: InPlace Update Mode by omerap12 · Pull Request #8818 · kubernetes/autoscaler

omerap12 · 2025-11-15T13:55:39Z

What type of PR is this?

/kind documentation

What this PR does / why we need it:

AEP for #8720

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

k8s-ci-robot · 2025-11-15T13:55:42Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

omerap12 · 2025-11-16T11:55:37Z

/cc @adrianmoisey @maxcao13

omerap12 · 2025-11-20T09:51:12Z

/kind api-review

k8s-ci-robot · 2025-11-20T09:51:15Z

@omerap12: The label(s) kind/api-review cannot be applied, because the repository doesn't have them.

Details

In response to this:

/kind api-review

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

omerap12 · 2025-11-20T09:51:38Z

/label kind/api-review

k8s-ci-robot · 2025-11-20T09:51:41Z

@omerap12: The label(s) /label kind/api-review cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor, ci-short, ci-extended, ci-full. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

Details

In response to this:

/label kind/api-review

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

omerap12 · 2025-11-20T21:12:23Z

/label api-review

iamzili · 2025-12-05T12:30:42Z

/lgtm

@omerap12 If you have a draft PR that makes sense to review, then let me know Omer.

k8s-ci-robot · 2025-12-05T12:30:47Z

@iamzili: changing LGTM is restricted to collaborators

Details

In response to this:

/lgtm

@omerap12 If you have a draft PR that makes sense to review, then let me know Omer.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

maxcao13 · 2025-12-05T20:20:02Z

I'll take a look at this next week @omerap12 sorry for the delay 🥲

adrianmoisey · 2026-01-13T10:50:10Z

+### Infeasible Attempt Tracking
+
+VPA must track `infeasible` resize attempts to prevent infinite retry loops. This is necessary because infeasibility can be detected at different points depending on the Kubernetes version:
+| Kubernetes Version                  | When Infeasibility Is Detected                     | `spec.resources` After Attempt | How VPA Learns            |
+|-------------------------------------|----------------------------------------------------|--------------------------------|---------------------------|
+|  < 1.36 (or later if KEP slips)     | After patch succeeds, kubelet reports status       | Updated to attempted value     | Resize status = Infeasible|
+|  >= 1.36 (targeted, not guaranteed) | At patch time, API server rejects                  | Unchanged (old value)          | Patch error response      |
+
+> **Note:** The Kubernetes sig-node team is *targeting* admission-time feasibility checks for 1.36, but this timeline is not guaranteed and may slip to a later release. VPA implements version-agnostic detection that works correctly regardless of which version introduces this change.


It may be worth linking to the issue/PR that discusses this.
Also, is this technically out of scope of the InPlace mode?
It's a problem we'll face for InPlaceOrRecreate also

I will link the PR.

Also, is this technically out of scope of the InPlace mode?

I don’t think so - this AEP should define a clear solution for it.

It’s a problem we’ll face for InPlaceOrRecreate as well.

Agreed, but since InPlaceOrRecreate can recreate Pods, we should take a different approach there.

Good catch flagging this. We should adjust it before it’s merged into k/k. SIG Node will likely put this guarded behind a feature gate, but we’ll need to account for that as well (e.g. someone enable this feature gate).

It’s a problem we’ll face for InPlaceOrRecreate as well.

Agreed, but since InPlaceOrRecreate can recreate Pods, we should take a different approach there.

Yeah, good point, I guess it needs to handle the error, rather than saving it. May be that can happen in the AEP for the InPlaceOrRecreate mode

iamzili · 2026-01-14T16:51:30Z

when thinking about the data structure that will store the infeasible attempts, one situation came to mind that I think we haven't discussed yet:

What if we record an infeasible attempt for a pod, and then for some reason, the pod is moved to another node (e.g. due to a node drain or node-pressure eviction), where the required resources suddenly become available? In that case, the updater would not try to apply the previously infeasible attempt to the pod, right?

adrianmoisey · 2026-01-15T04:37:23Z

What if we record an infeasible attempt for a pod, and then for some reason, the pod is moved to another node (e.g. due to a node drain or node-pressure eviction), where the required resources suddenly become available? In that case, the updater would not try to apply the previously infeasible attempt to the pod, right?

I had thought about this, and had decided that pods can't move nodes. My assumption was that a pod can be destroyed and a new one recreated on a new node, however, may be I'm wrong?

EDIT: oh, what about statefulsets? those names don't really change. May be we need to store the UID rather than the pod?

iamzili · 2026-01-15T06:45:02Z

EDIT: oh, what about statefulsets? those names don't really change. May be we need to store the UID rather than the pod?

we are fine with statefulsets, as the pod name + namespace together form a unique key. We are using the same approach in the clusterState.

adrianmoisey · 2026-01-15T06:47:56Z

EDIT: oh, what about statefulsets? those names don't really change. May be we need to store the UID rather than the pod?

we are fine with statefulsets, as the pod name + namespace together form a unique key. We are using the same approach in the clusterState.

Not sure I understand, a Pod named "my-first-statefulset-0" can be recreated and land on a new (larger) node, and still be called "my-first-statefulset-0", in which case we may not resize it, since it's still stored in the map (ie: the problem you described in a previous comment)

iamzili · 2026-01-15T06:55:42Z

EDIT: oh, what about statefulsets? those names don't really change. May be we need to store the UID rather than the pod?

we are fine with statefulsets, as the pod name + namespace together form a unique key. We are using the same approach in the clusterState.

Not sure I understand, a Pod named "my-first-statefulset-0" can be recreated and land on a new (larger) node, and still be called "my-first-statefulset-0", in which case we may not resize it, since it's still stored in the map (ie: the problem you described in a previous comment)

ah, now I understand what you meant 🙂 Good point! You are right - this issue that I described, and that you also thought about, is really a problem for statefulset pods, since deployment pods get a completely new pod name when they are evicted and scheduled onto a new node.

omerap12 · 2026-01-15T07:49:23Z

We can rely on the pod’s UID, which is guaranteed to be unique.

adrianmoisey · 2026-01-16T10:00:45Z

/kind api-change

(Is this correct?)

natasha41575 · 2026-01-20T19:24:00Z

+
+Periodically, VPA removes entries from the infeasibleAttempts map for pods that no longer exist. This prevents memory leaks from accumulating stale entries. This cleanup behavior is targeted for beta.
+
+**Key Difference from `InPlaceOrRecreate`**: In `InPlace` mode, `Deferred`, `Infeasible`, and `InProgress` statuses all result in waiting—VPA never falls back to eviction. In contrast, `InPlaceOrRecreate` mode may fall back to eviction after a timeout. This ensures that `InPlace` mode pods are never evicted, regardless of how long they remain in a non-updatable state.


all result in waiting—VPA never falls back to eviction

We are thinking about creating a proposal for IPPR-triggered scheduler eviction, so I'd like to confirm my understanding about the Deferred condition handling here. What I am envisioning is something like this:

VPA has a recommendation that could theoretically fit on the node, but that space is currently being occupied by some lower-priority pods. It initiates an in-place pod resize request (assume InPlaceOnly mode).

Kubelet marks the resize as Deferred in the pod status.

The scheduler watches for Deferred resizes, and preempts a lower-priority pod accordingly.

After the lower-priority pod is removed, kubelet triggers a retry of the deferred resize and it succeeds.

From VPA perspective, does this sound like a use case that the current InPlaceOnly proposal already supports?

cc @tallclair

Yes. The main point is this with Deferred, we are only waiting (with Infeasible, there is more logic involved e.g. remembering earlier recommendations etc ).

We are thinking about creating a proposal for IPPR-triggered scheduler eviction

Has the KEP been created already? I would like to review it :)

Thanks! I will CC you in the KEP.

Can you also CC me? Thanks!

soltysh

@omerap12 and @adrianmoisey asked me to check this from API review pov, and from that perspective this looks good 👍

soltysh · 2026-01-27T10:13:11Z

+	}
+
+	cr, present := ip.podToReplicaCreatorMap[getPodID(pod)]
+	if present {


Nit: I'd suggest inverting the conditions here (yes, I'm aware this is just a design doc 😉 but I won't likely review the actual PR, so I'm leaving that comment here), but that will make the code easier to understand. Same thing applies to identical check few lines below. Iow:

cr, present := ip.podToReplicaCreatorMap[getPodID(pod)] if !present { return utils.InPlaceDeferred } ... singleGroupStats, present := ip.creatorToSingleGroupStatsMap[cr] if pod.Status.Phase == apiv1.PodPending { return utils.InPlaceDeferred } if !present { return utils.InPlaceDeferred } ...

Thanks for the review. fixed :)

Signed-off-by: Omer Aplatony <[email protected]>

adrianmoisey · 2026-01-28T07:42:13Z

/lgtm

Thanks! This is amazing

omerap12 · 2026-01-28T08:08:18Z

/unhold

adrianmoisey · 2026-01-28T08:16:25Z

/approve

k8s-ci-robot · 2026-01-28T08:16:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianmoisey, omerap12, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~vertical-pod-autoscaler/enhancements/OWNERS~~ [adrianmoisey]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Omer Aplatony <[email protected]>

k8s-ci-robot requested review from adrianmoisey and kwiesmueller November 15, 2025 13:55

omerap12 changed the title ~~[WIP] In Place Only VPA~~ AEP-8720: InPlace Update Mode Nov 15, 2025

omerap12 changed the title ~~AEP-8720: InPlace Update Mode~~ AEP-8818: InPlace Update Mode Nov 15, 2025

omerap12 marked this pull request as ready for review November 16, 2025 11:55

k8s-ci-robot requested a review from maxcao13 November 16, 2025 11:55

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 16, 2025

k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label Nov 20, 2025

github-project-automation Bot added this to API Reviews Nov 20, 2025

omerap12 force-pushed the AEP-InPlace branch from 1caf8d7 to 54eed43 Compare January 13, 2026 10:29

adrianmoisey reviewed Jan 13, 2026

View reviewed changes

omerap12 force-pushed the AEP-InPlace branch from 54eed43 to 77759c9 Compare January 13, 2026 16:41

omerap12 requested review from adrianmoisey and iamzili January 15, 2026 10:23

k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jan 16, 2026

omerap12 mentioned this pull request Jan 20, 2026

AEP for VPA support for heterogenous workloads #8956

Open

natasha41575 reviewed Jan 20, 2026

View reviewed changes

soltysh approved these changes Jan 27, 2026

View reviewed changes

soltysh moved this to In progress in API Reviews Jan 27, 2026

AEP: InPlace only

5268de5

Signed-off-by: Omer Aplatony <[email protected]>

omerap12 force-pushed the AEP-InPlace branch from d060e99 to 5268de5 Compare January 28, 2026 07:34

k8s-ci-robot assigned adrianmoisey Jan 28, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 28, 2026

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 28, 2026

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 28, 2026

k8s-ci-robot merged commit 69c30a5 into kubernetes:master Jan 28, 2026
13 checks passed

liggitt removed this from API Reviews Jan 29, 2026

damikag pushed a commit to damikag/autoscaler that referenced this pull request Feb 3, 2026

AEP: InPlace only (kubernetes#8818)

805dc2d

Signed-off-by: Omer Aplatony <[email protected]>


		Periodically, VPA removes entries from the infeasibleAttempts map for pods that no longer exist. This prevents memory leaks from accumulating stale entries. This cleanup behavior is targeted for beta.

		Key Difference from `InPlaceOrRecreate`: In `InPlace` mode, `Deferred`, `Infeasible`, and `InProgress` statuses all result in waiting—VPA never falls back to eviction. In contrast, `InPlaceOrRecreate` mode may fall back to eviction after a timeout. This ensures that `InPlace` mode pods are never evicted, regardless of how long they remain in a non-updatable state.

Conversation

omerap12 commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Nov 15, 2025

Uh oh!

omerap12 commented Nov 16, 2025

Uh oh!

omerap12 commented Nov 20, 2025

Uh oh!

k8s-ci-robot commented Nov 20, 2025

Uh oh!

omerap12 commented Nov 20, 2025

Uh oh!

k8s-ci-robot commented Nov 20, 2025

Uh oh!

omerap12 commented Nov 20, 2025

Uh oh!

iamzili commented Dec 5, 2025

Uh oh!

k8s-ci-robot commented Dec 5, 2025

Uh oh!

maxcao13 commented Dec 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamzili commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrianmoisey commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iamzili commented Jan 15, 2026

Uh oh!

adrianmoisey commented Jan 15, 2026

Uh oh!

iamzili commented Jan 15, 2026

Uh oh!

omerap12 commented Jan 15, 2026

Uh oh!

adrianmoisey commented Jan 16, 2026

Uh oh!

natasha41575 Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soltysh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrianmoisey commented Jan 28, 2026

Uh oh!

omerap12 commented Jan 28, 2026

Uh oh!

adrianmoisey commented Jan 28, 2026

Uh oh!

k8s-ci-robot commented Jan 28, 2026

omerap12 commented Nov 15, 2025 •

edited

Loading

iamzili commented Jan 14, 2026 •

edited

Loading

adrianmoisey commented Jan 15, 2026 •

edited

Loading

natasha41575 Jan 20, 2026 •

edited

Loading