Skip to content

Conversation

@stas00
Copy link
Collaborator

@stas00 stas00 commented Nov 3, 2025

Make it very clear that TiledMLP's memory saving has a cost of recomputing forward.

Make it very clear that `TiledMLP`'s memory saving has a cost of recomputing forward.
@stas00 stas00 enabled auto-merge (squash) November 3, 2025 16:53
@stas00 stas00 merged commit 76a4075 into master Nov 3, 2025
12 checks passed
@stas00 stas00 deleted the stas00-patch-1 branch November 3, 2025 18:47
@stas00
Copy link
Collaborator Author

stas00 commented Nov 3, 2025

Thank you, Masahiro

@kidlj
Copy link

kidlj commented Nov 4, 2025

I just noticed that there's a typo in this commit: occurs trice.

@stas00
Copy link
Collaborator Author

stas00 commented Nov 4, 2025

that's not a typo, it does occur trice.

  1. normal forward
  2. activation checkpointing forward
  3. backward's internal forward-like recomputation.

in a normal non-tiled computation module only 1+2 occur.

This is the price to save memory. So 25% more computation.

aeeeeeep pushed a commit to aeeeeeep/DeepSpeed that referenced this pull request Nov 13, 2025
Make it very clear that `TiledMLP`'s memory saving has a cost of
recomputing forward.

Signed-off-by: aeeeeeep <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants