-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Enabled compiled autograd for backward pass #7667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Compiled Autograd is an extension to torch.compile which enhances the autograd engine by capturing a larger backward computation graph at runtime. This allows a more comprehensive optimization of the backward pass during training. Overall, 5-20% speedup is expected in backward-heavy workloads with stable graphs. Disabled by default, the feature can be enabled from a user script by setting 'compiled_autograd_enabled=True' when invoking the engine's 'compile' method. Signed-off-by: Max Kovalenko <[email protected]>
|
@deepcharm Thanks for the patch! Compiled autograd is not compatible with DeepCompile today as it will override the backward graph to which DeepCompile has inserted ZeRO ops. Having both enabled causes a Would you please warn the user and unset |
Signed-off-by: Max Kovalenko <[email protected]>
Signed-off-by: Max Kovalenko <[email protected]>
|
@eternalNight Thank you for the good catch! Updated the code per your request. Please let me know if that works. |
Signed-off-by: Max Kovalenko <[email protected]>
Signed-off-by: Max Kovalenko <[email protected]>
|
Thanks for the update! I'm trying different combinations of compile options. When playing with this model (https://gist.github.com/eternalNight/3c2cf8c703f1e9e7742d3b7f9e1edae3), I got when using the eager backend with Are those known issues of compiled_autograd? |
|
Thanks for the detailed testing—super helpful! These errors match known PyTorch issues with Compiled Autograd + distributed/mixed precision:
Let me know if clearing the cache resolves the second one for you. BTW what's your PyTorch version/setup? |
I'm using torch 2.7.1, and the model has torch autocast enabled by default, so does that mean compiled autograd should not be used with autocast now? Using a bf16 model leads me to a different error: This is how the graph looks like. The
I've removed /tmp/torchinductor_root/ (which is where inductor caches generated graph on my side), but the error persists.
I'm struggling to find a working example for DeepSpeed + compiled autograd. If you have a working model at hand, would you please include that as a unit test in this PR as well so that we can test its benefits? Thanks. |
Compiled Autograd is an extension to torch.compile which enhances the autograd engine by capturing a larger backward computation graph at runtime. This allows a more comprehensive optimization of the backward pass during training.
Overall, 5-20% speedup is expected in backward-heavy workloads with stable graphs.
Disabled by default, the feature can be enabled from a user script by setting
compiled_autograd_enabled=Truewhen invoking the engine'scompilemethod.Note, that bfloat16 + eager backend requires PyTorch >=2.5 (where partial fixes landed) or disabling compiled autograd for bfloat16 models (due to a known PyTorch bug in torch.compile PyTorch #152162/#161153)