-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
In what area(s)?
/area autoscale
/kind proposal
Describe the feature
Currently there are two config options that control scaling times:
stable-window: 60s
scale-to-zero-grace-period: 30s
I believe the first one (stable-window) will control how long an instance is idle before the system determines that it is eligible for termination/scale-down. I believe there is a hard-coded minimum of 2 seconds.
The second one (scale-to-zero-grace-period) controls how long (more) to wait on the last instance, with a minimum value of 30 seconds.
These minimum values are much larger than what we'd like our system/users to experience. Like start-up times, we'd like scale-down times to be as small as possible - like 50ms time range. Yes this means more creation of instances but this is the semantics we're looking for.
Proposal: allow for values of "zero" to be specified for both to indicate the desire for things to be terminated immediately upon the instance being idle, or done processing all current requests. Then people can set it to a higher value as their needs dictate.
Note: as with I think most (if not all) of our config timing values, this would not be a guarantee but rather the desired behavior that they system would try to match. This is no different than the default values we have today of a combined value of 90 seconds actually showing up (for me) to be more like 100-120 seconds regularly. Not a guarantee, just something for the system (autoscaler) to shoot for.