PB-5: improve exit code propogation for non-command containers by zhming0 · Pull Request #628 · buildkite/agent-stack-k8s

zhming0 · 2025-06-18T05:37:08Z

This PR slightly change how we propagate non-command containers error to bk api.

In initContainer failure, we will pass the first non-0 exit code from init containers back to backend.
When scheduler yaml failure, we will raise a StackError error reason -> this can be helpful to indicate that this is not an retryable error.

solves #575
solves PB-7 and PB-5

DrJosh9000

Some initial thoughts...

DrJosh9000

This is pretty good

zhming0 · 2025-06-19T01:34:17Z

+		// In scheduler worker, often these failure are technically users error in YAML.
+		// Using agent refuse seems to be reasonable as that represent that agent won't execute these yaml.
+		// Deserve more discussion though..
+		Reason: agent.SignalReasonAgentRefused,


@DrJosh9000 I think this change is debatable, subject to our confidence level. Just wanting to highlight this in case in you have opinion here

Hmm. I think the cases here where failJob is called are all "pre-agent", so using "agent refused" feels wrong for that reason. We also can't rule out the Kubernetes cluster having a bad day and rejecting a valid job. "Stack error" seems like the best choice for now.

Maybe the agent change should have included a "stack rejected" reason, as well as "stack error"?

Ah true, the controller isn't an agent. I changed it to stack error but with a bit comments explaining the caveats. 🙏🏿

DrJosh9000

LGTM!

zhming0 requested a review from a team June 18, 2025 05:37

zhming0 requested a review from a team as a code owner June 18, 2025 05:37

DrJosh9000 reviewed Jun 18, 2025

View reviewed changes

Comment thread internal/controller/scheduler/fail_job.go Outdated

Comment thread internal/controller/scheduler/fail_job.go Outdated

Comment thread internal/controller/scheduler/fail_job.go Outdated

zhming0 force-pushed the ming/pb-5 branch 2 times, most recently from 90fc67a to 819a190 Compare June 19, 2025 00:59

zhming0 requested a review from DrJosh9000 June 19, 2025 00:59

DrJosh9000 reviewed Jun 19, 2025

View reviewed changes

zhming0 force-pushed the ming/pb-5 branch from 819a190 to d623aa3 Compare June 19, 2025 01:31

zhming0 requested a review from DrJosh9000 June 19, 2025 01:33

zhming0 commented Jun 19, 2025

View reviewed changes

PB-5: improve exit code propogation for non-command containers

5a0da2a

zhming0 force-pushed the ming/pb-5 branch from d623aa3 to 5a0da2a Compare June 19, 2025 04:02

DrJosh9000 approved these changes Jun 19, 2025

View reviewed changes

zhming0 merged commit 35eabbd into main Jun 19, 2025
1 check passed

zhming0 deleted the ming/pb-5 branch June 19, 2025 05:13

zhming0 mentioned this pull request Jun 23, 2025

RFC: Use different exit statuses for different failures #500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PB-5: improve exit code propogation for non-command containers#628

PB-5: improve exit code propogation for non-command containers#628
zhming0 merged 1 commit into
mainfrom
ming/pb-5

zhming0 commented Jun 18, 2025 •

edited

Loading

Uh oh!

DrJosh9000 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DrJosh9000 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhming0 Jun 19, 2025

Uh oh!

DrJosh9000 Jun 19, 2025

Uh oh!

zhming0 Jun 19, 2025

Uh oh!

DrJosh9000 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhming0 commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrJosh9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DrJosh9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhming0 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

DrJosh9000 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

zhming0 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

DrJosh9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhming0 commented Jun 18, 2025 •

edited

Loading