Attempt to prevent duplicate workflow dispatch by marking jobs in progress at build start#74
Attempt to prevent duplicate workflow dispatch by marking jobs in progress at build start#74wilg wants to merge 1 commit intogame-ci:mainfrom
Conversation
WalkthroughThe status transition logic in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Three targeted fixes that close the race windows causing cascading
job failures and infinite re-dispatch loops:
1. Make registerNewBuild idempotent (ciBuilds.ts)
- If build already exists with status "started" and same jobId,
silently succeed (handles network timeout retries)
- If build already exists with status "published", silently succeed
- If build already exists with status "failed", overwrite with
"started" (existing retry behavior, preserved)
- If build exists with "started" but different jobId, throw with
a descriptive error message
Inspired by PR game-ci#73.
2. Add retry limits to base/hub image dispatch (scheduler.ts)
- Check job failureCount against maxFailuresPerBuild (15) before
re-dispatching base or hub image workflows
- Log a warning and send a Discord alert when the limit is reached
- Prevents infinite re-dispatch on every cron cycle when a
base/hub job is stuck in "created" or "failed" state
Uses new CiJobs.hasExceededRetryLimit() helper (ciJobs.ts).
3. Allow created -> inProgress transition (ciJobs.ts)
- markJobAsInProgress now accepts jobs with status "created" in
addition to "scheduled"
- Closes the race window where scheduler dispatches a workflow but
crashes before updating Firestore from "created" to "scheduled"
- The workflow's reportNewBuild call now moves the job out of the
schedulable state regardless of whether the scheduler updated it
Inspired by PR game-ci#74.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
What problem this fixes
We’ve seen
reportNewBuildfail with:A build with "<buildId>" as identifier already exists. That happens when the same workflow/build gets started twice, creating duplicate build reports for the samebuildId.Why it can happen
Right now, the scheduler:
created -> scheduledBecause dispatch is an external side-effect, there’s a small window where:
createdin Firestore (if we crash/time out before the update)Since the scheduler only picks jobs with
status == "created", that job can be dispatched again later.What this change does
When a workflow starts, it calls
reportNewBuild, which marks the job as in progress.This PR makes that “build started” signal authoritative by allowing:
created -> inProgress(in addition toscheduled -> inProgress)So even if the scheduler never wrote
created -> scheduled, the job still moves out of the schedulable state as soon as the workflow reports it started.Why this is safe
failedorcompleted.buildIdreports still fail as before.created.Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.