[AIEX] Canonicalize contiguous NpuDmaMemcpyNdOp accesses to linear form by hunhoffe · Pull Request #2924 · Xilinx/mlir-aie

hunhoffe · 2026-03-05T00:08:48Z

A row-major contiguous access pattern such as sizes=[s3, s2, s1, s0] strides=[st3, st2, st1, 1]
where st1==s0 and st2==s0*s1 (or the corresponding size is 1) is semantically identical to a linear transfer of N=s0*s1*s2 elements.

Add a canonicalization pattern that rewrites such ops to the canonical linear form: sizes=[s3, 1, 1, N] strides=[st3, 0, 0, 1]

In this form isLinearTransferWithoutTransformation() returns true, so verifyStridesWraps() skips the 10-bit d0 wrap-size check. The hardware uses a wider transfer-length register in linear mode, so larger Ns are supported.

This fixes the motivating case from issue #2825 where fill/drain on a 2D buffer (e.g. memref<M x K x bf16>) generates sizes=[1,1,M,K] with K>1023, which previously failed verification as a data-layout-transform dimension but is simply a contiguous linear transfer.

A row-major contiguous access pattern such as sizes=[s3, s2, s1, s0] strides=[st3, st2, st1, 1] where st1==s0 and st2==s0*s1 (or the corresponding size is 1) is semantically identical to a linear transfer of N=s0*s1*s2 elements. Add a canonicalization pattern that rewrites such ops to the canonical linear form: sizes=[s3, 1, 1, N] strides=[st3, 0, 0, 1] In this form isLinearTransferWithoutTransformation() returns true, so verifyStridesWraps() skips the 10-bit d0 wrap-size check. The hardware uses a wider transfer-length register in linear mode, so arbitrarily large N is supported. This fixes the motivating case from issue #2825 where fill/drain on a 2D buffer (e.g. memref<M x K x bf16>) generates sizes=[1,1,M,K] with K>1023, which previously failed verification as a data-layout-transform dimension but is simply a contiguous linear transfer. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

github-actions · 2026-03-05T00:36:58Z

Coverage Report

Created: 2026-03-13 03:16

Click here for information about interpreting this report.

Filename	Function Coverage	Line Coverage	Region Coverage	Branch Coverage
home/runner/work/mlir-aie/mlir-aie/lib/Dialect/AIEX/IR/AIEXDialect.cpp	98.21%	85.62%	88.71%	78.40%
Totals	98.21%	85.62%	88.71%	78.40%

Generated by llvm-cov -- llvm version 18.1.3

andrej

Nice clear PR description! I think this canonicalization is useful.

A row-major contiguous access pattern such as sizes=[s3, s2, s1, s0] strides=[st3, st2, st1, 1]
where st1==s0 and st2==s0*s1 (or the corresponding size is 1)

For all tests, if size == 1 the corresponding stride == 0. Does this pass also canonicalize if the stride != 0? Do we want to canonicalize in that case, or perhaps error (since with a size of 1, the stride will never be applied, so might indicate user confusion)? Either way, can we add a test for some cases where size == 1 and stride != 0?

Tests for the DMA task syntax would be good too.

Is this same canonicalization also applied to the dimensions of an ObjectFifo (dimensionsToStream, dimensionsFromStream)?

andrej · 2026-03-06T17:49:56Z

lib/Dialect/AIEX/IR/AIEXDialect.cpp

+// limit violations: in the resulting linear form, isLinearTransferWithout-
+// Transformation() returns true, so verifyStridesWraps() skips the 10-bit
+// d0 wrap-size check.  The hardware uses a wider transfer-length register in
+// linear mode, so arbitrarily large N is supported.


I think there are still limits but they are very large.

hunhoffe and others added 2 commits March 4, 2026 17:01

Merge branch 'main' into fix/linearize-contiguous-dma-memcpy-nd

8ab2109

hunhoffe mentioned this pull request Mar 5, 2026

Linear transfers (without TensorAccessPattern) shouldn't use data layout transformation dimensions #2825

Open

Merge branch 'main' into fix/linearize-contiguous-dma-memcpy-nd

e396627

andrej approved these changes Mar 6, 2026

View reviewed changes

hunhoffe added 3 commits March 9, 2026 10:45

Merge branch 'main' into fix/linearize-contiguous-dma-memcpy-nd

18e58a6

Merge branch 'main' into fix/linearize-contiguous-dma-memcpy-nd

7f3dc9a

Merge branch 'main' into fix/linearize-contiguous-dma-memcpy-nd

69379de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIEX] Canonicalize contiguous NpuDmaMemcpyNdOp accesses to linear form#2924

[AIEX] Canonicalize contiguous NpuDmaMemcpyNdOp accesses to linear form#2924
hunhoffe wants to merge 6 commits intomainfrom
fix/linearize-contiguous-dma-memcpy-nd

hunhoffe commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

andrej left a comment •

edited

Loading

Uh oh!

andrej Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hunhoffe commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Created: 2026-03-13 03:16

Generated by llvm-cov -- llvm version 18.1.3

Uh oh!

andrej left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrej Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 5, 2026 •

edited

Loading

andrej left a comment •

edited

Loading