Grab-bag of improvements to the IR we send to LLVM #1243

krzysz00 · 2023-09-14T22:39:16Z

Notes to reviewers:

The first four commits (related to DistinctAttr and the handling of AliasScopeAttr and the like) are direct ports from upstream: I figured I'd port in now because the new solution is nicer and it'll save us the pain during the next upstream merge.
The three commits after that (related to adding nontemporal to vector.load and vector.store, improving the -gpu-to-llvm attribute handling, and adding !invariant.load to llvm.load) are mine and are upstreaming candidates that should be reviewed as such.
The last commit is all the internal plumbing, both in terms of -rock-analyze-memory-use and -rock-prepare-llvm`. I've copied out the commit message for that last commit below.

A lot of our existing annotation scheme for memory accesses was
scattered all over the place. This commit unifies things into
-rock-analyze-memory-use.

This pass covers:

Walking through global_load and global_store operations in order to
find all the "needs 64 bit index" attributes and moving that
declaration to the function level. That means we won't have issues
with the fact that that runs in blockwise-gemm-to-threadwise.
Detecting readonly and writeonly arguments. This is done by looking
for arguments that are only (ignoring views) arguments to global_load
or global_store(set) respectively, and then putting those annotations
onto the function arguments for lowering to LLVM.
2b. Setting nontemporal flags on stores to writeonly values. If we
won't be reading it, we want the write to skip cache, and nontemporal
is a hint to make that happen. (It doesn't apply to buffer stores yet
since we don't have address space 7, but it's better than nothing).
Setting all sorts of relevant LLVM annotations (noalias, nonnull,
noundef, dereferencable, and so on) in order to promise the compiler
things we know are true anyway (if they're not, that's on the client).

Then, this commit also expands our LLVM preparation pass.
The pass is made to run on llvm.func ops, and we use the nesting
mechanism to ensure that we run that on functions inside GPU modules
specifically.

The preparation pass is updated to

Update the alignment of vector loads and stores to be the alignment
of the vector, not its component, since MLIR's lowering is
pessimistic in a way we don't want to be.
Set !invariant.load on loads from readonly values to given out even
stronger hints about the ability to hoist or sink operations.
Work around a shortcoming in how the AMDGPU backend handles
noalias (specifically, it gets dropped on kernel arguments during
target-specific lowering) by adding our own alias scopes, which
indicate that loads from one pointer or buffer can't alias the loads
from the other ones. This will hopefully reduce the number of
scheduling DAG edges and therefore give us some of our compile
performance back ... or at least it'll give us better codegen thanks
to more accurate alias information.

This commit also adds the "nontemporal" attribute to rock.global_store
so that -rock-sugar-to-loops can set it on the underlying stores.

krzysz00 · 2023-09-14T22:40:09Z

Also, @giuseros, if you've got the command for benchmarking that post-large-tensors slowdown handy, I can run it on this change.

krzysz00 · 2023-09-15T22:21:34Z

Update: preliminary perf checks are showing that setting nontemporal is usually, but not uniformly, a bad idea in our contexts, so that part of the PR will probably be removed (though I'll keep the upstream changes)

giuseros

The code looks fine, I am a bit worried about introducing further analysis in the back-end that might slow compilation time. In particular, I am worried about the invariant.load attribute (see rust-lang/rust#103070). Could you try to tune some configurations and verify that the compile time does not get too bad?

giuseros · 2023-09-26T09:04:09Z

mlir/lib/Dialect/Rock/Transforms/RockPrepareLLVM.cpp

+      return;
+    unsigned argNo = funcArg.getArgNumber();
+    if (auto load = dyn_cast<LLVM::LoadOp>(aliasOp))
+      load.setInvariantLoad(isReadonly[argNo]);


Not sure how much is relevant, but I stumbled upon this: rust-lang/rust#103070 , which talks about invariant.start. Do you mind making sure that the compilation time is not heavily affected by this?

giuseros · 2023-09-26T09:08:24Z

mlir/lib/Dialect/Rock/Transforms/RockPrepareLLVM.cpp

+    op.setFailureOrdering(LLVM::AtomicOrdering::monotonic);
+  });
+
+  // 4. TODO: add some invariant.start calls once MLIR's got them.


See my comment above :)

giuseros · 2023-09-26T09:22:45Z

Also, @giuseros, if you've got the command for benchmarking that post-large-tensors slowdown handy, I can run it on this change.

This is the command I was using to investigate compilation slow downs:

time python3 ./bin/tuningRunner.py --op gemm --config="-g 64 -m 1024 -k 1024 -n 384 -t f32 -out_datatype f32 -transA 0 -transB 0"

manupak · 2023-09-27T12:57:09Z

It d be much easier to review if you can break the things you port from upstream and what you've implemented new in this PR, because the former would be a no brainer to get in.

krzysz00 · 2023-09-27T21:07:53Z

@manupak You can set the review view to be filtered by commits, if that helps

giuseros

LGTM!

The `allocsize` attribute is weird because it packs two 32-bit values into a 64-bit value. It also turns out that the passthrough attribute exporter was using `int`, which is incorrectly handling 64-bit integers. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D156574

Since vector loads and stores from scalar memrefs translate to llvm.load/store, add the ability to tag said loads and stores as nontemporal. This mirrors functionality available in memref.load/store.

Expand the copying of attributes on GPU kernel arguments during LLVM lowering. Support copying attributes from values that are already LLVM pointers. Support copying attributes, like `noundef`, that aren't specific to (the pointer parts of) arguments.

Add support for !invariant.load metadata (by way of a unit attribute) to the MLIR representation of llvm.load.

A lot of our existing annotation scheme for memory accesses was scattered all over the place. This commit unifies things into -rock-analyze-memory-use. This pass covers: 1. Walking through global_load and global_store operations in order to find all the "needs 64 bit index" attributes and moving that declaration to the function level. That means we won't have issues with the fact that that runs in blockwise-gemm-to-threadwise. 2. Detecting readonly and writeonly arguments. This is done by looking for arguments that are only (ignoring views) arguments to global_load or global_store(set) respectively, and then putting those annotations onto the function arguments for lowering to LLVM. 3. Setting all sorts of relevant LLVM annotations (noalias, nonnull, noundef, dereferencable, and so on) in order to promise the compiler things we know are true anyway (if they're not, that's on the client). Then, this commit also expands our LLVM preparation pass. The pass is made to run on llvm.func ops, and we use the nesting mechanism to ensure that we run that on functions inside GPU modules specifically. The preparation pass is updated to 1. Update the alignment of vector loads and stores to be the alignment of the vector, not its component, since MLIR's lowering is pessimistic in a way we don't want to be. 2. Set !invariant.load on loads from readonly values to given out even stronger hints about the ability to hoist or sink operations. 3. Work around a shortcoming in how the AMDGPU backend handles `noalias` (specifically, it gets dropped on kernel arguments during target-specific lowering) by adding our own alias scopes, which indicate that loads from one pointer or buffer can't alias the loads from the other ones. This will hopefully reduce the number of scheduling DAG edges and therefore give us some of our compile performance back ... or at least it'll give us better codegen thanks to more accurate alias information. This commit also adds the "nontemporal" attribute to rock.global_store so that -rock-sugar-to-loops can set it on the underlying stores.

krzysz00 requested review from giuseros, ravil-mobile and sjw36 September 14, 2023 22:39

krzysz00 force-pushed the memory-analysis branch from bdb0356 to 0c292d6 Compare September 15, 2023 21:20

krzysz00 force-pushed the memory-analysis branch 2 times, most recently from 6e2cfa7 to 97f9201 Compare September 20, 2023 17:04

krzysz00 mentioned this pull request Sep 25, 2023

Refactor rock-blockwise-gemm-to-threadwise pass #1254

Merged

giuseros reviewed Sep 26, 2023

View reviewed changes

krzysz00 force-pushed the memory-analysis branch from 97f9201 to 17a9624 Compare October 17, 2023 15:10

krzysz00 force-pushed the memory-analysis branch from 17a9624 to 2d70b0a Compare November 20, 2023 22:36

giuseros approved these changes Nov 27, 2023

View reviewed changes

krzysz00 force-pushed the memory-analysis branch 2 times, most recently from 38eeb42 to ee74198 Compare December 6, 2023 17:22

Mogball and others added 5 commits December 20, 2023 16:03

[mlir][Vector] Add nontemporal attribute, mirroring memref

ea0fe42

Since vector loads and stores from scalar memrefs translate to llvm.load/store, add the ability to tag said loads and stores as nontemporal. This mirrors functionality available in memref.load/store.

[mlir][LLVM] Add !invariant.load metadata support to llvm.load

0d8fbbd

Add support for !invariant.load metadata (by way of a unit attribute) to the MLIR representation of llvm.load.

krzysz00 force-pushed the memory-analysis branch from ee74198 to fe17f44 Compare December 20, 2023 16:44

krzysz00 merged commit d28f6df into develop Jan 2, 2024

krzysz00 deleted the memory-analysis branch January 2, 2024 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Grab-bag of improvements to the IR we send to LLVM #1243

Grab-bag of improvements to the IR we send to LLVM #1243

Uh oh!

krzysz00 commented Sep 14, 2023 •

edited

Loading

Uh oh!

krzysz00 commented Sep 14, 2023

Uh oh!

krzysz00 commented Sep 15, 2023

Uh oh!

giuseros left a comment

Uh oh!

giuseros Sep 26, 2023

Uh oh!

giuseros Sep 26, 2023

Uh oh!

giuseros commented Sep 26, 2023

Uh oh!

manupak commented Sep 27, 2023

Uh oh!

krzysz00 commented Sep 27, 2023

Uh oh!

giuseros left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Grab-bag of improvements to the IR we send to LLVM #1243

Grab-bag of improvements to the IR we send to LLVM #1243

Uh oh!

Conversation

krzysz00 commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krzysz00 commented Sep 14, 2023

Uh oh!

krzysz00 commented Sep 15, 2023

Uh oh!

giuseros left a comment

Choose a reason for hiding this comment

Uh oh!

giuseros Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

giuseros Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

giuseros commented Sep 26, 2023

Uh oh!

manupak commented Sep 27, 2023

Uh oh!

krzysz00 commented Sep 27, 2023

Uh oh!

giuseros left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

krzysz00 commented Sep 14, 2023 •

edited

Loading