Skip to content

combined inner outer reduction used in layer norm backward #2399

@liqiangxl

Description

@liqiangxl

🚀 The feature, motivation and pitch

combine inner and outer reduction into one kernel.

  1. do partial outer reduction while blocks are looping over outer domain doing block inner reduction.
  2. write result of partial outer reduction to gmem
  3. sync and reload from gmem
  4. remap parallel pattern to finalized outer reduciton.

used in ln_backward.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions