-
Notifications
You must be signed in to change notification settings - Fork 662
Remove the hack for SAC + FlexAttention #2118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
tianyu-l
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I wonder if you have verified it "works", i.e. it doesn't invalidate SAC / compile anymore?
| [activation_checkpoint] | ||
| mode = "selective" # ["none", "selective", "full"] | ||
| selective_ac_option = '2' # 'int' = ac every positive int layer or 'op', ac based on ops policy | ||
| selective_ac_option = 'op' # 'int' = ac every positive int layer or 'op', ac based on ops policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended? I think it's OK to change this but want to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, accidentally commit this.
|
Yes, I verified it works with llama3 and llama4. FlexAttention is compiled within SAC. |
soulitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
| options={ | ||
| "wrap_inductor_compiled_regions": True, | ||
| "max_autotune": True, | ||
| "coordinate_descent_tuning": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noob question: is this coordinate_descent_tuning also part of the "mode=max-autotune-no-cudagraphs" -> "options={...}" change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot to ask: what's the context of this change to "options={}"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot do mode and options at the same. torch.compile forbid it. According to here, max-autotune-no-cudagraphs equals to these three options.
[ghstack-poisoned]
Stack from ghstack (oldest at bottom):
PyTorch can now support torch.compile inside the SAC region even if torch.compile is not used to wrap SAC. This PR removes the workaround to ensure torch.compile works with Flex