Skip to content

A-1136: stop detecting init container failure for sidecar containers#862

Merged
zhming0 merged 1 commit into
mainfrom
ming/a-1136
Apr 17, 2026
Merged

A-1136: stop detecting init container failure for sidecar containers#862
zhming0 merged 1 commit into
mainfrom
ming/a-1136

Conversation

@zhming0
Copy link
Copy Markdown
Contributor

@zhming0 zhming0 commented Apr 17, 2026

Problem

When a pod completes, native sidecar init containers (restartPolicy: Always) are terminated by the kubelet. If any sidecar exits with a non-zero code — either from SIGKILL (137) after the grace period or SIGTERM (143) if the process doesn't handle it gracefully — failOnInitContainerFailure misinterprets this as an init container startup failure and attempts to fail the already-finished Buildkite job, which returns 404.

At high scale (200+ concurrent jobs), this creates a burst of blocking 404 API calls that stalls the informer event loop, delaying new job pickup.

Fixes #827.

Fix

Skip init containers with RestartPolicy: Always in failOnInitContainerFailure. These are native K8s sidecar containers that run alongside main containers — their non-zero exit codes during pod shutdown are expected, not failures.

@zhming0 zhming0 requested a review from a team April 17, 2026 00:59
@zhming0 zhming0 requested a review from a team as a code owner April 17, 2026 00:59
Copy link
Copy Markdown
Member

@CerealBoy CerealBoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 🚀

@zhming0 zhming0 merged commit cd3fea6 into main Apr 17, 2026
3 checks passed
@zhming0 zhming0 deleted the ming/a-1136 branch April 17, 2026 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] failOnInitContainerFailure retries indefinitely on 404 API::Error::NotFound, causing zombie job/pod accumulation

2 participants