Skip to content

Commit 8da7c05

Browse files
authored
[WebAssembly] Fold constant i8x16.swizzle and i8x16.relaxed.swizzle to shufflevector (#169110)
Resolves #169058. This adds ~~an InstCombine pass~~ a TTI hook to the WebAssembly backend that folds `i8x16.swizzle` and `i8x16.relaxed.swizzle` operations to `shufflevector` operations if their mask operands are constant. This is mainly useful for abstractions over the raw intrinsics--for instance, in architecture-generic SIMD code that may not be able to expose the constant shuffles due to type system limitations. I took most of this from the x86 backend (in particular, `simplifyX86vpermilvar` in `X86InstCombineIntrinsic`), and adapted it for the WebAssembly backend. There wasn't any previous `instCombineIntrinsic` method on the WebAssembly `TargetTransformInfo`, so I added it. Right now, this swizzle optimization is the only one it performs. As I noted in the transform itself, the "relaxed" swizzle actually has stricter preconditions than the non-relaxed one. If a non-negative but still out-of-bounds index is provided, the "relaxed" swizzle can choose between returning 0 and the lane at the index modulo 16. However, it must make the same choice every time, and we don't know which choice the runtime will make, so we can't constant-fold it. The regression tests were mostly generated by Claude and adapted a bit by me (I tried to follow the [InstCombine contributor guide](https://llvm.org/docs/InstCombineContributorGuide.html#tests)). There was previously no WebAssembly subdirectory within the InstCombine tests, so I created that too; as of now, the swizzle fold test is the only file in it. Everything else was written by myself (well, partly copy-pasted from the x86 backend). I'm not sure how to write an Alive2 test for this; I can't find any examples where the input is an arbitrary constant.
1 parent ec51fd2 commit 8da7c05

3 files changed

Lines changed: 203 additions & 0 deletions

File tree

llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,9 @@
1313
//===----------------------------------------------------------------------===//
1414

1515
#include "WebAssemblyTargetTransformInfo.h"
16+
#include "llvm/IR/IntrinsicInst.h"
17+
#include "llvm/IR/IntrinsicsWebAssembly.h"
18+
#include "llvm/Transforms/InstCombine/InstCombiner.h"
1619

1720
#include "llvm/CodeGen/CostTable.h"
1821
using namespace llvm;
@@ -493,3 +496,87 @@ bool WebAssemblyTTIImpl::isProfitableToSinkOperands(
493496

494497
return false;
495498
}
499+
500+
/// Attempt to convert [relaxed_]swizzle to shufflevector if the mask is
501+
/// constant.
502+
static Value *simplifyWasmSwizzle(const IntrinsicInst &II,
503+
InstCombiner::BuilderTy &Builder,
504+
bool IsRelaxed) {
505+
auto *V = dyn_cast<Constant>(II.getArgOperand(1));
506+
if (!V)
507+
return nullptr;
508+
509+
auto *VecTy = cast<FixedVectorType>(II.getType());
510+
unsigned NumElts = VecTy->getNumElements();
511+
assert(NumElts == 16);
512+
513+
// Construct a shuffle mask from constant integers or UNDEFs.
514+
int Indexes[16];
515+
bool AnyOutOfBounds = false;
516+
517+
for (unsigned I = 0; I < NumElts; ++I) {
518+
Constant *COp = V->getAggregateElement(I);
519+
if (!COp || (!isa<UndefValue>(COp) && !isa<ConstantInt>(COp)))
520+
return nullptr;
521+
522+
if (isa<UndefValue>(COp)) {
523+
Indexes[I] = -1;
524+
continue;
525+
}
526+
527+
if (IsRelaxed && cast<ConstantInt>(COp)->getSExtValue() >= NumElts) {
528+
// The relaxed_swizzle operation always returns 0 if the lane index is
529+
// less than 0 when interpreted as a signed value. For lane indices above
530+
// 15, however, it can choose between returning 0 or the lane at `Index %
531+
// 16`. However, the choice must be made consistently. As the WebAssembly
532+
// spec states:
533+
//
534+
// "The result of relaxed operators are implementation-dependent, because
535+
// the set of possible results may depend on properties of the host
536+
// environment, such as its hardware. Technically, their behaviour is
537+
// controlled by a set of global parameters to the semantics that an
538+
// implementation can instantiate in different ways. These choices are
539+
// fixed, that is, parameters are constant during the execution of any
540+
// given program."
541+
//
542+
// The WebAssembly runtime may choose differently from us, so we can't
543+
// optimize a relaxed swizzle with lane indices above 15.
544+
return nullptr;
545+
}
546+
547+
uint64_t Index = cast<ConstantInt>(COp)->getZExtValue();
548+
if (Index >= NumElts) {
549+
AnyOutOfBounds = true;
550+
// If there are out-of-bounds indices, the swizzle instruction returns
551+
// zeroes in those lanes. We'll provide an all-zeroes vector as the
552+
// second argument to shufflevector and read the first element from it.
553+
Indexes[I] = NumElts;
554+
continue;
555+
}
556+
557+
Indexes[I] = Index;
558+
}
559+
560+
auto *V1 = II.getArgOperand(0);
561+
auto *V2 =
562+
AnyOutOfBounds ? Constant::getNullValue(VecTy) : PoisonValue::get(VecTy);
563+
564+
return Builder.CreateShuffleVector(V1, V2, ArrayRef(Indexes, NumElts));
565+
}
566+
567+
std::optional<Instruction *>
568+
WebAssemblyTTIImpl::instCombineIntrinsic(InstCombiner &IC,
569+
IntrinsicInst &II) const {
570+
Intrinsic::ID IID = II.getIntrinsicID();
571+
switch (IID) {
572+
case Intrinsic::wasm_swizzle:
573+
case Intrinsic::wasm_relaxed_swizzle:
574+
if (Value *V = simplifyWasmSwizzle(
575+
II, IC.Builder, IID == Intrinsic::wasm_relaxed_swizzle)) {
576+
return IC.replaceInstUsesWith(II, V);
577+
}
578+
break;
579+
}
580+
581+
return std::nullopt;
582+
}

llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,9 @@ class WebAssemblyTTIImpl final : public BasicTTIImplBase<WebAssemblyTTIImpl> {
103103
bool isProfitableToSinkOperands(Instruction *I,
104104
SmallVectorImpl<Use *> &Ops) const override;
105105

106+
std::optional<Instruction *>
107+
instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const override;
108+
106109
/// @}
107110
};
108111

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
2+
; RUN: opt < %s -passes=instcombine -mtriple=wasm32-unknown-unknown -S | FileCheck %s
3+
4+
; swizzle with a constant operand should be optimized to a shufflevector.
5+
6+
; Identity swizzle pattern
7+
define <16 x i8> @swizzle_identity(<16 x i8> %v) {
8+
; CHECK-LABEL: define <16 x i8> @swizzle_identity(
9+
; CHECK-SAME: <16 x i8> [[V:%.*]]) {
10+
; CHECK-NEXT: ret <16 x i8> [[V]]
11+
;
12+
%result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
13+
ret <16 x i8> %result
14+
}
15+
16+
; Reverse swizzle pattern
17+
define <16 x i8> @swizzle_reverse(<16 x i8> %v) {
18+
; CHECK-LABEL: define <16 x i8> @swizzle_reverse(
19+
; CHECK-SAME: <16 x i8> [[V:%.*]]) {
20+
; CHECK-NEXT: [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> poison, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
21+
; CHECK-NEXT: ret <16 x i8> [[RESULT]]
22+
;
23+
%result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
24+
ret <16 x i8> %result
25+
}
26+
27+
; poison elements
28+
define <16 x i8> @swizzle_with_poison(<16 x i8> %v) {
29+
; CHECK-LABEL: define <16 x i8> @swizzle_with_poison(
30+
; CHECK-SAME: <16 x i8> [[V:%.*]]) {
31+
; CHECK-NEXT: [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> poison, <16 x i32> <i32 0, i32 poison, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
32+
; CHECK-NEXT: ret <16 x i8> [[RESULT]]
33+
;
34+
%result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 poison, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
35+
ret <16 x i8> %result
36+
}
37+
38+
; Negative test: non-constant operand
39+
define <16 x i8> @swizzle_non_constant(<16 x i8> %v, <16 x i8> %mask) {
40+
; CHECK-LABEL: define <16 x i8> @swizzle_non_constant(
41+
; CHECK-SAME: <16 x i8> [[V:%.*]], <16 x i8> [[MASK:%.*]]) {
42+
; CHECK-NEXT: [[RESULT:%.*]] = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> [[V]], <16 x i8> [[MASK]])
43+
; CHECK-NEXT: ret <16 x i8> [[RESULT]]
44+
;
45+
%result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> %mask)
46+
ret <16 x i8> %result
47+
}
48+
49+
; Out-of-bounds index, otherwise identity pattern
50+
define <16 x i8> @swizzle_out_of_bounds_1(<16 x i8> %v) {
51+
; CHECK-LABEL: define <16 x i8> @swizzle_out_of_bounds_1(
52+
; CHECK-SAME: <16 x i8> [[V:%.*]]) {
53+
; CHECK-NEXT: [[RESULT1:%.*]] = insertelement <16 x i8> [[V]], i8 0, i64 15
54+
; CHECK-NEXT: ret <16 x i8> [[RESULT1]]
55+
;
56+
%result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 16>)
57+
ret <16 x i8> %result
58+
}
59+
60+
; Out-of-bounds indices, both negative and positive
61+
define <16 x i8> @swizzle_out_of_bounds_2(<16 x i8> %v) {
62+
; CHECK-LABEL: define <16 x i8> @swizzle_out_of_bounds_2(
63+
; CHECK-SAME: <16 x i8> [[V:%.*]]) {
64+
; CHECK-NEXT: [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> <i8 0, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison>, <16 x i32> <i32 16, i32 16, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
65+
; CHECK-NEXT: ret <16 x i8> [[RESULT]]
66+
;
67+
%result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
68+
ret <16 x i8> %result
69+
}
70+
71+
; Identity swizzle pattern (relaxed_swizzle)
72+
define <16 x i8> @relaxed_swizzle_identity(<16 x i8> %v) {
73+
; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_identity(
74+
; CHECK-SAME: <16 x i8> [[V:%.*]]) {
75+
; CHECK-NEXT: ret <16 x i8> [[V]]
76+
;
77+
%result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
78+
ret <16 x i8> %result
79+
}
80+
81+
; Reverse swizzle pattern (relaxed_swizzle)
82+
define <16 x i8> @relaxed_swizzle_reverse(<16 x i8> %v) {
83+
; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_reverse(
84+
; CHECK-SAME: <16 x i8> [[V:%.*]]) {
85+
; CHECK-NEXT: [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> poison, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
86+
; CHECK-NEXT: ret <16 x i8> [[RESULT]]
87+
;
88+
%result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
89+
ret <16 x i8> %result
90+
}
91+
92+
; Out-of-bounds index, only negative (relaxed_swizzle)
93+
define <16 x i8> @relaxed_swizzle_out_of_bounds(<16 x i8> %v) {
94+
; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_out_of_bounds(
95+
; CHECK-SAME: <16 x i8> [[V:%.*]]) {
96+
; CHECK-NEXT: [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> <i8 0, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison>, <16 x i32> <i32 16, i32 16, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
97+
; CHECK-NEXT: ret <16 x i8> [[RESULT]]
98+
;
99+
%result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 -99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
100+
ret <16 x i8> %result
101+
}
102+
103+
; Negative test: out-of-bounds index, both positive and negative (relaxed_swizzle)
104+
; The choice between different relaxed semantics can only be made at runtime, since it must be consistent.
105+
define <16 x i8> @relaxed_swizzle_out_of_bounds_positive(<16 x i8> %v) {
106+
; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_out_of_bounds_positive(
107+
; CHECK-SAME: <16 x i8> [[V:%.*]]) {
108+
; CHECK-NEXT: [[RESULT:%.*]] = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> [[V]], <16 x i8> <i8 99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
109+
; CHECK-NEXT: ret <16 x i8> [[RESULT]]
110+
;
111+
%result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
112+
ret <16 x i8> %result
113+
}

0 commit comments

Comments
 (0)