[Codegen] Emulate `gather_to_lds` when it has narrow element types by lialan · Pull Request #23758 · iree-org/iree

lialan · 2026-03-12T19:25:27Z

First step to support DMA for scaled GEMMs.

Add ConvertGatherToLDS pattern to AMDGPUEmulateNarrowType pass.
In the pass focusing on gather_to_lds op, adjust subbyte element type to i8. e.g. vector<32xf4E2M1FN> -> vector<16xi8>.
Semantically the same before and after.

…ulation First step to support DMA for scaled GEMMs. * Add ConvertGatherToLDS pattern to AMDGPUEmulateNarrowType pass. * Adjust subbyte element type to i8. e.g. vector<32xf4E2M1FN> -> vector<16xi8>

Copilot

Pull request overview

Adds support in the AMDGPU narrow-type emulation pipeline to rewrite amdgpu.gather_to_lds when its source/destination memrefs are converted from sub-byte element types to byte-sized (i8) types, enabling upcoming DMA support for scaled GEMMs.

Changes:

Add a ConvertGatherToLDS conversion pattern to rewrite amdgpu.gather_to_lds for sub-byte element types.
Linearize multidimensional indices into a 1D packed-byte index and adjust the transfer vector type accordingly.
Extend MLIR FileCheck coverage for gather_to_lds conversions (including async forms and various sub-byte element types).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
compiler/src/iree/compiler/Codegen/LLVMGPU/test/amdgpu_emulate_narrow_type.mlir	Adds FileCheck tests for `gather_to_lds` sub-byte element type conversion to `i8`.
compiler/src/iree/compiler/Codegen/LLVMGPU/AMDGPUEmulateNarrowType.cpp	Introduces `ConvertGatherToLDS` pattern to linearize/pack indices and update transfer types during narrow type emulation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

compiler/src/iree/compiler/Codegen/LLVMGPU/AMDGPUEmulateNarrowType.cpp

lialan · 2026-03-12T20:23:24Z

@copilot open a new pull request to apply changes based on the comments in this thread

Copilot · 2026-03-12T20:23:32Z

@lialan I've opened a new pull request, #23763, to work on those changes. Once the pull request is ready, I'll request review from you.

…emulation (#23763) `ConvertGatherToLDS` had several correctness issues in `linearizeAndPack` that could silently produce wrong IR or crash in non-assert builds. **Fixes:** - **Offset ignored**: Memref layout offset was never incorporated into the linearized index. Now checks for dynamic offset (fails the pattern) and initializes `linearIdx = offset + sum(idx[i] * stride[i])`. - **Silent rank mismatch**: `llvm::zip(indices, strides)` silently truncated when sizes differed. Added explicit `indices.size() != strides.size()` guard. - **Assert in rewrite path**: `assert(newBits > origBits && newBits % origBits == 0)` would crash in debug and silently miscompile in release. Replaced with `return nullptr` (propagated as `notifyMatchFailure` by callers). - **Misleading error message**: `"not a multiple of byte width"` described the wrong invariant; corrected to `"not a multiple of the new element bit width"` to match the actual check (`totalBits % newSrcBits != 0`). - **Unsafe early return**: Removed the `origBits == newBits && 1D` fast-path that bypassed offset handling entirely.  --- ✨ Let Copilot coding agent [set things up for you](https://github.com/iree-org/iree/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: lialan <[email protected]>

compiler/src/iree/compiler/Codegen/LLVMGPU/AMDGPUEmulateNarrowType.cpp

* Remove unnecessary cast<MemRefType> on op accessors (already typed) * Pass async attribute directly to GatherToLDSOp builder * Add comment explaining dynamic offset/stride rejection * Add assert for transfer size divisibility in convertTransferType Co-Authored-By: Claude Opus 4.6 <[email protected]>

lialan · 2026-03-13T01:52:39Z

@krzysz00 no offence, I was testing claude automation, and it replied your review comments all by itself.

[Codegen] Support gather_to_lds with sub-byte types in narrow type em…

5c5a70d

…ulation First step to support DMA for scaled GEMMs. * Add ConvertGatherToLDS pattern to AMDGPUEmulateNarrowType pass. * Adjust subbyte element type to i8. e.g. vector<32xf4E2M1FN> -> vector<16xi8>

lialan force-pushed the users/lialan/subbyte_gather_to_lds branch from 25dcfcf to 5c5a70d Compare March 12, 2026 19:31

lialan changed the title ~~[Codegen] Emulate gather_to_lds when it has narrow sub-byte element types~~ [Codegen] Emulate gather_to_lds when it has narrow element types Mar 12, 2026

lialan requested a review from Copilot March 12, 2026 20:04

Copilot started reviewing on behalf of lialan March 12, 2026 20:07 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

Copilot AI mentioned this pull request Mar 12, 2026

[Codegen] Fix correctness issues in ConvertGatherToLDS narrow type emulation #23763

Merged

Copilot AI and others added 2 commits March 12, 2026 16:33

Another fix.

026613b

lialan marked this pull request as ready for review March 13, 2026 00:49

lialan requested review from Groverkss, Max191, krzysz00, kuhar, nirvedhmeshram and qedawkins as code owners March 13, 2026 00:49

krzysz00 reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen] Emulate `gather_to_lds` when it has narrow element types #23758

[Codegen] Emulate `gather_to_lds` when it has narrow element types #23758
lialan wants to merge 4 commits intomainfrom
users/lialan/subbyte_gather_to_lds

lialan commented Mar 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lialan commented Mar 12, 2026

Uh oh!

Copilot AI commented Mar 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lialan commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lialan commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lialan commented Mar 12, 2026

Uh oh!

Copilot AI commented Mar 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lialan commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lialan commented Mar 12, 2026 •

edited

Loading

lialan commented Mar 13, 2026 •

edited

Loading