You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Address infrastructure failures that cause cascading test failures
(464 failures across 46 builds over 3 weeks).
1. Make stale node GC tolerant (cluster.go):
- Ignore NotFound errors on node deletion (already gone)
- Log other delete errors as warnings instead of failing
- Only return error if zero nodes could be deleted
Evidence: 79 failures from 'failed to delete 3 stale nodes'
2. Fix existingVMSS map (cluster.go):
- Only add VMSS to existingVMSS if they are being kept
- VMSS queued for deletion are excluded, so their stale K8s
nodes can be cleaned up in the same pass
- If VMSS deletion fails, keep in map to avoid orphaned deletes
3. Retry Bastion subnet GET (cluster.go):
- Poll with backoff for up to 30s on transient ARM errors
- 404 still handled normally (create subnet)
Evidence: 179 failures from 'get subnet AzureBastionSubnet:
context deadline exceeded'
4. Retry AKS subnet + route table lookup (aks_model.go):
- Poll with backoff for up to 2 minutes
- Handles both transient GET failures and kubenet route table
propagation delays after cluster create/reuse
Evidence: 39 failures from 'AKS subnet has no route table'
5. Retry Firewall creation (aks_model.go):
- Poll with backoff for up to 10 minutes
- BeginCreateOrUpdate is idempotent, safe to retry
Evidence: 90 failures from 'failed to create Firewall:
context deadline exceeded'
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
toolkit.Logf(ctx, "route table not ready (retrying): %v", rtErr)
337
+
returnfalse, nil
338
+
}
339
+
aksRTName=rtName
340
+
returntrue, nil
341
+
})
322
342
iferr!=nil {
323
-
returnerr
343
+
returnfmt.Errorf("failed to get AKS subnet and route table after retries: %w (last subnet error: %v, last route table error: %v)", err, lastSubnetErr, lastRTErr)
324
344
}
325
345
326
346
// Create AzureFirewallSubnet - this subnet name is required by Azure Firewall
0 commit comments