Skip to content

Conversation

@JoeColeman95
Copy link
Contributor

Adds InstanceScaleInProtection as a configurable option, as this is currently just hard-coded to true.

This will default to true to replicate the same behaviour, but enables users to disable this when/if required.

Updated launch script for our own CI to inject this value as false.

Adds `InstanceScaleInProtection` as a configurable option, as this is currently just hard-coded to `true`.

This will default to `true` to replicate the same behaviour, but enables users to disable this when/if required.

Updated launch script for our own CI to inject this value as `false`.
@JoeColeman95 JoeColeman95 requested a review from a team as a code owner November 18, 2025 14:55
@JoeColeman95 JoeColeman95 merged commit ad76d91 into main Nov 18, 2025
1 check passed
@JoeColeman95 JoeColeman95 deleted the SUP-5563-Adding-configurable-InstanceScaleInProtection branch November 18, 2025 16:33
@AliSoftware
Copy link

Thanks for exposing this parameter!

I actually have some questions about the behavior and impact of changing this parameter

Could this help solve our CF updates' deployment delays?

We've noticed recently that updating our CloudFormation Buildkite stacks1 took >30m—with the AutoScalingGroup being blocked in DELETION_IN_PROGRESS state for 32mn—even while we made sure to pause the Buildkite queue and wait for all the pending jobs to finish before starting the CF deployment.

Could the fact that NewInstancesProtectedFromScaleIn was hardcoded to true until now be what prevented the graceful termination of the EC2 instances during the CF and ASG deployment, it timing out on waiting for the graceful termination after 30mn to ultimately fall back to hard/forced termination instead (explaining the ≈32mn)?

Any potential risk of side effects?

If so I'd be templated to add InstanceScaleInProtection = false in our aws_cloudformation_stack to fix this delay during deployments… but then I'm wondering about any potential drawbacks if we should be aware of any potential side-effects of such change (like a risk of our Buildkite jobs being terminated mid-run due to that protection being disabled)?

Or maybe there are some other template parameters that need to be set to values consistent with the scale-in protection being disabled, to ensure the scaling still behave as expected (e.g. doesn't risk terminating instances mid-job when the ASG adjusts desiredCapacity to account for the cluster queue demand, or on the contrary never scale in and terminate extraneous instances after they've finished their jobs and reached the ScaleInIdlePeriod…)?

Thanks!

Footnotes

  1. We manage our CF stacks and Buildkite EC2 agents via terraform and aws_cloudformation_stack resources + terraform apply

@AliSoftware
Copy link

AliSoftware commented Nov 21, 2025

Also, if I try setting InstanceScaleInProtection = true on my templates, my ASG enters a never-ending cycle of considering every newly created instance immediately unhealthy, which leads it to immediately terminate it and start a new one, wash, rince, repeat…
image

Shouldn't we set the ASG's Health check grace period to a value > 0 (or have it use a lifecycle hook) to prevent that from happening? 🤔
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants