Skip to content

Attempt to prevent duplicate workflow dispatch by marking jobs in progress at build start#74

Closed
wilg wants to merge 1 commit intogame-ci:mainfrom
wilg:wilg/attempt-2
Closed

Attempt to prevent duplicate workflow dispatch by marking jobs in progress at build start#74
wilg wants to merge 1 commit intogame-ci:mainfrom
wilg:wilg/attempt-2

Conversation

@wilg
Copy link

@wilg wilg commented Dec 13, 2025

What problem this fixes

We’ve seen reportNewBuild fail with: A build with "<buildId>" as identifier already exists. That happens when the same workflow/build gets started twice, creating duplicate build reports for the same buildId.

Why it can happen

Right now, the scheduler:

  • Dispatches the GitHub Actions workflow first, then
  • Updates Firestore to move the job from created -> scheduled

Because dispatch is an external side-effect, there’s a small window where:

  • GitHub has accepted the dispatch (workflow may already be running), but
  • The job can still be created in Firestore (if we crash/time out before the update)

Since the scheduler only picks jobs with status == "created", that job can be dispatched again later.

What this change does

When a workflow starts, it calls reportNewBuild, which marks the job as in progress.
This PR makes that “build started” signal authoritative by allowing:

  • created -> inProgress (in addition to scheduled -> inProgress)

So even if the scheduler never wrote created -> scheduled, the job still moves out of the schedulable state as soon as the workflow reports it started.

Why this is safe

  • Does not override terminal states like failed or completed.
  • Does not change the duplicate-build protection: duplicate buildId reports still fail as before.
  • It only closes the gap where a running workflow could leave its job marked created.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed CI job state transitions to allow jobs in the created state to progress to in-progress status, in addition to jobs in the scheduled state.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 13, 2025

Walkthrough

The status transition logic in CiJobs.markJobAsInProgress() was expanded to accept jobs from both 'created' and 'scheduled' states (previously only 'scheduled') when marking them as in progress, while maintaining all other existing behavior and error handling.

Changes

Cohort / File(s) Summary
Job status transition logic
functions/src/model/ciJobs.ts
Expanded the markJobAsInProgress() method's condition to allow state transitions from both 'created' and 'scheduled' to 'inProgress', broadening the permissible source states for job activation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • Verify the business logic rationale for allowing 'created' state transitions in addition to 'scheduled'
  • Confirm no downstream assumptions depend solely on 'scheduled' as the only preceding state
  • Check for related tests or state machine documentation that may need updates

Poem

🐰 A job once stuck in "created" limbo,
Now hops to "inProgress" with a swift boing—
Two paths instead of one to flow,
Status transitions dancing in a ring! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main objective of the PR: preventing duplicate workflow dispatch by marking jobs as in progress at build start, which matches the core change to markJobAsInProgress.
Description check ✅ Passed The description is comprehensive and addresses the required template sections, providing detailed context about the problem, root cause, solution, and safety considerations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0089cf and db99889.

📒 Files selected for processing (1)
  • functions/src/model/ciJobs.ts (1 hunks)
🔇 Additional comments (1)
functions/src/model/ciJobs.ts (1)

208-208: Change successfully addresses the race condition window.

The expanded condition correctly allows workflows to transition from created -> inProgress, preventing re-dispatch by the scheduler even when the scheduled state was never written. This narrows the race window where duplicate workflows could be dispatched.

The terminal state protection (lines 206-210) is properly maintained—jobs in failed, completed, or other terminal states won't be overridden.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@webbertakken webbertakken mentioned this pull request Dec 13, 2025
3 tasks
frostebite added a commit to frostebite/versioning-backend that referenced this pull request Mar 14, 2026
Three targeted fixes that close the race windows causing cascading
job failures and infinite re-dispatch loops:

1. Make registerNewBuild idempotent (ciBuilds.ts)
   - If build already exists with status "started" and same jobId,
     silently succeed (handles network timeout retries)
   - If build already exists with status "published", silently succeed
   - If build already exists with status "failed", overwrite with
     "started" (existing retry behavior, preserved)
   - If build exists with "started" but different jobId, throw with
     a descriptive error message
   Inspired by PR game-ci#73.

2. Add retry limits to base/hub image dispatch (scheduler.ts)
   - Check job failureCount against maxFailuresPerBuild (15) before
     re-dispatching base or hub image workflows
   - Log a warning and send a Discord alert when the limit is reached
   - Prevents infinite re-dispatch on every cron cycle when a
     base/hub job is stuck in "created" or "failed" state
   Uses new CiJobs.hasExceededRetryLimit() helper (ciJobs.ts).

3. Allow created -> inProgress transition (ciJobs.ts)
   - markJobAsInProgress now accepts jobs with status "created" in
     addition to "scheduled"
   - Closes the race window where scheduler dispatches a workflow but
     crashes before updating Firestore from "created" to "scheduled"
   - The workflow's reportNewBuild call now moves the job out of the
     schedulable state regardless of whether the scheduler updated it
   Inspired by PR game-ci#74.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant