[AIEX] Canonicalize contiguous NpuDmaMemcpyNdOp accesses to linear form#2924
[AIEX] Canonicalize contiguous NpuDmaMemcpyNdOp accesses to linear form#2924
Conversation
A row-major contiguous access pattern such as sizes=[s3, s2, s1, s0] strides=[st3, st2, st1, 1] where st1==s0 and st2==s0*s1 (or the corresponding size is 1) is semantically identical to a linear transfer of N=s0*s1*s2 elements. Add a canonicalization pattern that rewrites such ops to the canonical linear form: sizes=[s3, 1, 1, N] strides=[st3, 0, 0, 1] In this form isLinearTransferWithoutTransformation() returns true, so verifyStridesWraps() skips the 10-bit d0 wrap-size check. The hardware uses a wider transfer-length register in linear mode, so arbitrarily large N is supported. This fixes the motivating case from issue #2825 where fill/drain on a 2D buffer (e.g. memref<M x K x bf16>) generates sizes=[1,1,M,K] with K>1023, which previously failed verification as a data-layout-transform dimension but is simply a contiguous linear transfer. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Coverage ReportCreated: 2026-03-13 03:16Click here for information about interpreting this report.
Generated by llvm-cov -- llvm version 18.1.3 |
There was a problem hiding this comment.
Nice clear PR description! I think this canonicalization is useful.
A row-major contiguous access pattern such as sizes=[s3, s2, s1, s0] strides=[st3, st2, st1, 1]
where st1==s0 and st2==s0*s1 (or the corresponding size is 1)
For all tests, if size == 1 the corresponding stride == 0. Does this pass also canonicalize if the stride != 0? Do we want to canonicalize in that case, or perhaps error (since with a size of 1, the stride will never be applied, so might indicate user confusion)? Either way, can we add a test for some cases where size == 1 and stride != 0?
Tests for the DMA task syntax would be good too.
Is this same canonicalization also applied to the dimensions of an ObjectFifo (dimensionsToStream, dimensionsFromStream)?
| // limit violations: in the resulting linear form, isLinearTransferWithout- | ||
| // Transformation() returns true, so verifyStridesWraps() skips the 10-bit | ||
| // d0 wrap-size check. The hardware uses a wider transfer-length register in | ||
| // linear mode, so arbitrarily large N is supported. |
There was a problem hiding this comment.
I think there are still limits but they are very large.
A row-major contiguous access pattern such as
sizes=[s3, s2, s1, s0] strides=[st3, st2, st1, 1]where
st1==s0andst2==s0*s1(or the corresponding size is 1) is semantically identical to a linear transfer ofN=s0*s1*s2elements.Add a canonicalization pattern that rewrites such ops to the canonical linear form:
sizes=[s3, 1, 1, N] strides=[st3, 0, 0, 1]In this form
isLinearTransferWithoutTransformation()returns true, soverifyStridesWraps()skips the 10-bitd0wrap-size check. The hardware uses a wider transfer-length register in linear mode, so larger Ns are supported.This fixes the motivating case from issue #2825 where fill/drain on a 2D buffer (e.g.
memref<M x K x bf16>) generatessizes=[1,1,M,K]withK>1023, which previously failed verification as a data-layout-transform dimension but is simply a contiguous linear transfer.