Skip to content

Conversation

@akashveramd
Copy link
Collaborator

In this PR, I implemented ciflow/rocm. The changes are part of integration_test_8gpu_features.yaml. The workflow still supports running on pull_request (without any PR label) for CUDA. However, along with push to main and cron schedule, with the ciflow/8gpu label added to PR, the workflow runs for both CUDA & ROCm.

@akashveramd akashveramd self-assigned this Dec 5, 2025
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 5, 2025
@tianyu-l
Copy link
Contributor

tianyu-l commented Dec 5, 2025

with the ciflow/8gpu label added to PR, the workflow runs for both CUDA & ROCm.

Can this label be automatically added? If not, what happens if it is not manually added -- is rocm test skipped?

@akashveramd
Copy link
Collaborator Author

with the ciflow/8gpu label added to PR, the workflow runs for both CUDA & ROCm.

Can this label be automatically added? If not, what happens if it is not manually added -- is rocm test skipped?

If the label is not added manually then the ROCm tests will not run. CUDA tests can still run normally.
@huydhn: Please correct me if I am wrong, but we wanted to add label manually by code owners due to security risk, similar to how it works in PyTorch?

@huydhn
Copy link
Contributor

huydhn commented Dec 6, 2025

If the label is not added manually then the ROCm tests will not run. CUDA tests can still run normally. @huydhn: Please correct me if I am wrong, but we wanted to add label manually by code owners due to security risk, similar to how it works in PyTorch?

For the context, ciflow bot only works if the PR author has write permission to the repo or if the CI has already been approved to run https://github.com/pytorch/test-infra/blob/main/torchci/lib/bot/ciflowPushTrigger.ts#L176-L183. This policy applies to every repos under PyTorch.

  • In the first case, it's perfectly ok to automatically add ciflow label.
  • In the second case, even if you have the mechanism to add ciflow automatically, the bot will remove the label from the PR if the author is an unknown contributor with only read permission. When this happens, the reviewer needs to review the PR, approve CI run, then add the label manually anyway. This is the manual process that I'm referring to.

So, let's be clear that adding the label automatically is doable, but only in the first case for privilege users. The mechanism to automatically add the label is independent from the change in this PR, so you can do it in a separate PR later. We need:

  1. A labeler.yml rule like https://github.com/pytorch/pytorch/blob/main/.github/labeler.yml#L11. This works in most cases, e.g. the PR change a file and ciflow label is added
  2. For more complex logic like using regex, we might need to update autoLabelbot at https://github.com/pytorch/test-infra/blob/main/torchci/lib/bot/autoLabelBot.ts#L26

One note is that when auto label rule is in place, having both the ciflow push trigger and pull_request triggers feel redundant, and the latter can be removed.

@@ -0,0 +1,2 @@
ciflow_push_tags:
- ciflow/8gpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does ciflow_push_tags do?
why it's called 8gpu?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ciflow_push_tags tells the bot which labels should map to git tags that trigger CI workflows.

It's called 8gpu following Huy's suggestion on slack https://pytorch.slack.com/archives/C0A0E8ARXPZ/p1764843858514179

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems adding ciflow/8gpu label in a PR will automatically create a tag ciflow/8gpu/[PR_number]. I say so because I observed https://github.com/pytorch/torchtitan/releases/tag/ciflow%2F8gpu%2F2125

What is this ciflow_push_tags for? Is it for letting the bot know that the above happened and which tag to look at?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. As per my understanding, when we add PR label, it'll automatically create & push tag. Because the ciflow_push_tags config tells the bot to create & push the matching tags so that tag-based CI workflows can run.

Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that

  • for each PR a codeowner needs to explicitly add the ciflow/8gpu label, so that rocm test could be run?
  • if a codeowner doesn't do this, the PR can be merged without rocm test passing?
  • after CUDA test is run / finished, if we add the label, we can run rocm test without rerun the CUDA test?

@@ -0,0 +1,2 @@
ciflow_push_tags:
- ciflow/8gpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems adding ciflow/8gpu label in a PR will automatically create a tag ciflow/8gpu/[PR_number]. I say so because I observed https://github.com/pytorch/torchtitan/releases/tag/ciflow%2F8gpu%2F2125

What is this ciflow_push_tags for? Is it for letting the bot know that the above happened and which tag to look at?

Comment on lines +6 to +7
tags:
- ciflow/8gpu/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for -- why do we need both pull_request and tags?

Copy link
Collaborator Author

@akashveramd akashveramd Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my understanding the PR workflows and tag workflows are totally independent. Tags provides CI flow, meaning tags can be pushed to trigger CI runs on specific commits even after the PR is closed. They can also be used for versioning releases.
pull_request are used to run workflows when the PR is open.

Comment on lines +2 to +6
- .ci/docker/**
- .github/workflows/**
- scripts/**
- tests/**
- torchtitan/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these for?

IS_SCHEDULE: ${{ github.event_name == 'schedule' }}
IS_PR: ${{ github.event_name == 'pull_request' }}
HAS_8GPU_LABEL: ${{ github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'ciflow/8gpu') }}
IS_8GPU_TAG: ${{ startsWith(github.ref, 'refs/tags/ciflow/8gpu/') }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this, if we already have the one above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot. module: rocm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants