Skip to content

Conversation

@jeanschmidt
Copy link
Contributor

Changes lambda entry point for autorevert to enable better handling eventbridge parameters.

The changes are:

  • Moved entry point from a global main to specific ones for CLI and lambda;
  • Moved main execution logic to a reused function;
  • Created a general config holder interface dataclass;

this enables to use the lambda with different configs from eventbridge like:

resource "aws_cloudwatch_event_rule" "pytorch_auto_revert_5m" {
  name                = "pytorch-auto-revert-5m"
  description         = "Every 5 minutes with config A"
  schedule_expression = "rate(5 minutes)"
}

resource "aws_cloudwatch_event_target" "pytorch_auto_revert_5m_target" {
  rule      = aws_cloudwatch_event_rule.pytorch_auto_revert_5m.name
  target_id = "pytorch-auto-revert-5m"
  arn       = aws_lambda_function.pytorch_auto_revert.arn

  input = jsonencode({
    mode      = "hud"
    dry_run   = false
    overrides = {
      bisection_limit    = 10
      clickhouse_database = "default"
    }
  })
}

Key reasonings on the decisions:

  • created a single class for parsed configs that is distinct from default configs, still allowing validating and maintaining readability and single use (one is default, another is actual);
  • Reused entry points from lambda and cli, so, mostly if not all, can be fully replicated via cli in case of firefight or debugging;
  • broken entry point for cli and lambda: one parses opts and dotfiles, another parse eventbridge. Separation of concerns and easier to read;
  • removed possibility of running without any parameters in CLI to replicate lambda - there will be multiple configurations, so, this does not make sense anymore;

@pytorch-bot pytorch-bot bot added the ci-no-td label Dec 1, 2025
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 1, 2025
@vercel
Copy link

vercel bot commented Dec 1, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Preview Updated (UTC)
torchci Ignored Ignored Preview Dec 2, 2025 11:18pm

@jeanschmidt jeanschmidt self-assigned this Dec 1, 2025
@jeanschmidt jeanschmidt marked this pull request as draft December 1, 2025 23:16
@jeanschmidt jeanschmidt marked this pull request as ready for review December 2, 2025 21:37

if opts.subcommand is None:
if check_autorevert_disabled(default_config.repo_full_name):
if config.subcommand is None or config.subcommand == "autorevert-checker":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think config.subcommand can never be None now?

)
self.secret_store_name = os.environ.get("SECRET_STORE_NAME", "")
self.secret_store_name = os.environ.get(
"SECRET_STORE_NAME", "pytorch-autorevert-secrets"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"SECRET_STORE_NAME", "pytorch-autorevert-secrets"
"SECRET_STORE_NAME", ""

I'd suggest setting it to empty by default to make local CLI backward compatible.

Otherwise, if you try to run something like

python -m pytorch_auto_revert --dry-run autorevert-checker pull --hours 24  --hud-html

without setting --secret-store-name "" or SECRET_STORE_NAME=""
it would throw ERROR:root:Failed to retrieve secrets from AWS Secrets Manager which is confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants