Skip to content

Conversation

@jeanschmidt
Copy link
Contributor

@jeanschmidt jeanschmidt commented Jan 30, 2026

Summary

A user accidentally added the label ci: disable-autorevert to a PR, this is an honest mistake. But it silently disabled autorevert for the whole repo for 2 days. This is because we don't have validation on who can use this label, and the GitHub API considers PRs as issues, and we're only filtering by issues.

To address this, this PR:

  • Limits the autorevert circuit breaker reaction only to ci: disable-autorevert labels added to issues, not PRs;
  • Limits who can add those labels;
  • Lets the user know when they accidentally added ci: disable-autorevert label to a PR, when most likely they wanted autorevert: disable.

Add guardrails to autorevert circuit breaker

Problem

The autorevert circuit breaker was activating on any open item (issue or PR) with the ci: disable-autorevert label. This created two issues:

  • PRs were triggering the circuit breaker - The label was accidentally applied to a PR, disabling autorevert system-wide
  • No authorization control - Any user could disable autorevert by creating an issue with the label

Solution

Added two guardrails to the circuit breaker:

  1. Ignore Pull Requests
    Only actual GitHub issues can activate the circuit breaker
    PRs with the label are logged and skipped
    Rationale: Circuit breaker is meant for explicit manual intervention via issues, not automatic PR labeling
  2. Approved Users List

Added optional approved_users parameter to restrict who can disable autorevert

defaults to:

DEFAULT_CIRCUIT_BREAKER_APPROVED_USERS: set[str] = set(
    "albanD",
    "atalman",
    "drisspg",
    "ezyang",
    "huydhn",
    "izaitsevfb",
    "janeyx99",
    "jeanschmidt",
    "malfet",
    "seemethere",
    "wdvr",
    "yangw-dev",
    "ZainRizvi",
)

Add validation for ci: disable-autorevert label on pull requests

Summary

Adds automatic validation to prevent users from accidentally using the ci: disable-autorevert label on pull requests. This label is intended for issues (to disable the entire autorevert system via circuit breaker) and is commonly confused with autorevert: disable (which prevents autorevert on a specific PR).

Changes

  • autoLabelBot.ts: Added handleDisableAutorevertLabel() helper function that:
  • Detects when ci: disable-autorevert is added to a PR (either at creation or later)
  • Automatically removes the incorrect label
  • Posts a helpful comment explaining the difference and suggesting the correct label
  • Prevents duplicate warnings by checking for existing warning comments

Behavior

  • When a user adds ci: disable-autorevert to a PR, the bot will:
  • Remove the label immediately
  • Post a comment (once) explaining:
  • This label is for issues only (disables entire autorevert system)
  • For PRs, use autorevert: disable instead
  • Silently remove the label on subsequent attempts without spamming comments

@vercel
Copy link

vercel bot commented Jan 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
torchci Ready Ready Preview Jan 30, 2026 8:27pm

Request Review

@pytorch-bot pytorch-bot bot added the ci-no-td label Jan 30, 2026
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 30, 2026
Copy link
Contributor

@izaitsevfb izaitsevfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the runtime issue with the set

ideally, add tests that should've been tripped by this issue

and PLEASE run it locally!

DEFAULT_WORKFLOWS = ["Lint", "trunk", "pull", "inductor", "linux-aarch64", "slow"]
DEFAULT_WORKFLOW_RESTART_DAYS = 7
# Users authorized to disable autorevert via circuit breaker issue
DEFAULT_CIRCUIT_BREAKER_APPROVED_USERS: set[str] = set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would fail with TypeError: set expected at most 1 argument, got 13 at import time,

please construct set as a literal, or from an iterable

@@ -26,11 +37,29 @@ def check_autorevert_disabled(repo_full_name: str = "pytorch/pytorch") -> bool:
should_disable = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks redundant, let's remove it?

@huydhn
Copy link
Contributor

huydhn commented Jan 31, 2026

With auto-revert working in auto-pilot mode, have we considered removing ci: disable-autorevert label? Basically, it means removing the ability to disable auto-revert via GH issue instead of trying to secure it this way:

My thought process is:

  • Even if I, as oncall, have the permission to add ci: disable-autorevert, I probably won't do it without consulting you and Ivan before using that option. So, it doesn't seem like this feature will be used often nor by a bigger group. Removing it means no one could use it in the wrong way
  • With a smaller group of power users, it's easy to find other way to disable auto revert completely, for example, by setting an env variable on the lambda or disable the trigger. It's a bit more work, but can be covered in the runbook, and whoever does that know exactly what it does
  • Less code to maintain is a nice little bonus, we also don't need to think about who we should added

@jeanschmidt
Copy link
Contributor Author

@huydhn I guess you have a point. If everyone in the team is capable of going to AWS and disable the lambda schedulers this should do the trick.

So, lets then remove all the shortcircuit logic, as this is extra complexity

@jeanschmidt jeanschmidt closed this Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants