Skip to content

Conversation

@Mel-Chen
Copy link
Contributor

@Mel-Chen Mel-Chen commented Jan 8, 2026

Currently, VPVectorPointerRecipe has the form: vector-pointer %ptr, (%offset).
This makes it difficult to add a stride operand to VPVectorPointerRecipe for #147297.
One approach to solve this is to convert VPVectorPointerRecipe into an abstract recipe with the form vector-pointer %ptr, %stride_in_bytes, and revert to handle the unroll part by VPUnrollPartAccessor apporach. The conversion of each VPVectorPointerRecipe to Mul + PtrAdd would only happen during VPlanTransforms::convertToConcreteRecipes. As a first step, this patch changes the emitted IR from typed GEP to byte GEP. This will also help with future support for unaligned strided accesses.

@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2026

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-powerpc

Author: Mel Chen (Mel-Chen)

Changes

Currently, VPVectorPointerRecipe has the form: vector-pointer %ptr, (%offset).
This makes it difficult to add a stride operand to VPVectorPointerRecipe for #147297.
One approach to solve this is to convert VPVectorPointerRecipe into an abstract recipe with the form vector-pointer %ptr, %stride_in_bytes, and revert to handle the unroll part by VPUnrollPartAccessor apporach. The conversion of each VPVectorPointerRecipe to Mul + PtrAdd would only happen during VPlanTransforms::convertToConcreteRecipes. As a first step, this patch changes the emitted IR from typed GEP to byte GEP. This will also help with future support for unaligned strided accesses.


Patch is 596.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/174934.diff

121 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+1-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp (+7-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+7-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll (+21-21)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fixed-wide-lane-mask.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fmax-without-fast-math-flags.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fmin-without-fast-math-flags.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/gather-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll (+10-7)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/intrinsiccost.ll (+2-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/licm-calls.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/load-cast-context.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll (+15-10)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce.ll (+5-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/pr151664-cost-hoisted-vector-scalable.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll (+4-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/replicating-load-store-costs.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+317-290)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/select-index.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-strict-reductions.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+20-12)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-fneg.ll (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll (+4-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-live-out-pointer-induction.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-multiexit.ll (+6-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-predicated-costs.ll (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll (+5-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll (+15-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-vscale-based-trip-counts.ll (+15-10)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-wide-lane-mask.ll (+12-8)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+48-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-unroll.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/uniform-args-call-variants.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/LoongArch/defaults.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/optimal-epilog-vectorization.ll (+56-56)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/reductions.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+13-7)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-inloop-reduction.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cast-costs.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/conversion-cost.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/float-induction-x86.ll (+19-19)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/fminimumnum.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/gather-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/induction-step.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/intrinsiccost.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/iv-live-outs.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+33-33)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked-store-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+117-117)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr23997.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr35432.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr47437.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/reduction-crash.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/uniform_load.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-force-tail-with-evl.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/widened-value-used-as-scalar-and-first-lane.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/assume.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/cse-casts.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/cse-gep-source-element-type.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/expand-scev-after-invoke.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/fcmp-uno-fold-interleave.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-dead-instructions.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/fmax-without-fast-math-flags-interleave.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/induction.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-nested-loop.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-trunc.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp.ll (+39-39)
  • (modified) llvm/test/Transforms/LoopVectorize/metadata.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/minimumnum-maximumnum-reductions.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/narrow-to-single-scalar.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/nested-loops-scev-expansion.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/noalias-scope-decl.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+44-44)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-odd-interleave-counts.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-assume.ll (+9-6)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll (+15-9)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-inductions.ll (+6-4)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-iv-outside-user.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/scalar_after_vectorization.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/select-cmp-multiuse.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/select-cmp.ll (+15-15)
  • (modified) llvm/test/Transforms/LoopVectorize/select-index-interleaving.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/single-early-exit-interleave-hint.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/single-early-exit-interleave.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/struct-return-replicate.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-outside-iv-users.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-pointer-gep-idxty-addrspace.ll (+6-6)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index b9cd322d9ec69..60365f3ca65c4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2595,7 +2595,7 @@ void VPVectorPointerRecipe::execute(VPTransformState &State) {
          "Expected prior simplification of recipe without offset");
   Value *Ptr = State.get(getOperand(0), VPLane(0));
   Value *Offset = State.get(getOffset(), true);
-  Value *ResultPtr = Builder.CreateGEP(getSourceElementType(), Ptr, Offset, "",
+  Value *ResultPtr = Builder.CreateGEP(Builder.getInt8Ty(), Ptr, Offset, "",
                                        getGEPNoWrapFlags());
   State.set(this, ResultPtr, /*IsScalar*/ true);
 }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
index be2a68ca40b93..5c8f7557ef4fc 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
@@ -309,8 +309,14 @@ void UnrollState::unrollRecipeByUF(VPRecipeBase &R) {
       VPValue *VFxPart = Builder.createOverflowingOp(
           Instruction::Mul, {VF, Plan.getConstantInt(IndexTy, Part)},
           {true, true});
+      VPValue *Offset = Builder.createOverflowingOp(
+          Instruction::Mul,
+          {VFxPart,
+           Plan.getConstantInt(
+               IndexTy, DL.getTypeAllocSize(VPR->getSourceElementType()))},
+          {true, true});
       Copy->setOperand(0, VPR->getOperand(0));
-      Copy->addOperand(VFxPart);
+      Copy->addOperand(Offset);
       continue;
     }
     if (auto *Red = dyn_cast<VPReductionRecipe>(&R)) {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
index 95b4dcb23dd47..c1e29dd9d04f9 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
@@ -14,7 +14,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[VECTOR_RECUR:%.*]] = phi <2 x i64> [ <i64 poison, i64 0>, %[[VECTOR_PH]] ], [ [[WIDE_LOAD1:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i64 2
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 16
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x i64>, ptr [[TMP2]], align 8
 ; CHECK-NEXT:    [[WIDE_LOAD1]] = load <2 x i64>, ptr [[TMP5]], align 8
 ; CHECK-NEXT:    [[TMP6:%.*]] = shufflevector <2 x i64> [[VECTOR_RECUR]], <2 x i64> [[WIDE_LOAD]], <2 x i32> <i32 1, i32 2>
@@ -22,7 +22,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    [[TMP8:%.*]] = call <2 x i64> @llvm.fshl.v2i64(<2 x i64> splat (i64 1), <2 x i64> [[TMP6]], <2 x i64> splat (i64 1))
 ; CHECK-NEXT:    [[TMP9:%.*]] = call <2 x i64> @llvm.fshl.v2i64(<2 x i64> splat (i64 1), <2 x i64> [[TMP7]], <2 x i64> splat (i64 1))
 ; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[DST]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[TMP10]], i64 2
+; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[TMP10]], i64 16
 ; CHECK-NEXT:    store <2 x i64> [[TMP8]], ptr [[TMP10]], align 8
 ; CHECK-NEXT:    store <2 x i64> [[TMP9]], ptr [[TMP13]], align 8
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index d7d77cb4325d4..d4b59e6839999 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -63,7 +63,7 @@ define void @loop_dependent_cond(ptr %src, ptr noalias %dst, i64 %N) {
 ; DEFAULT:       [[VECTOR_BODY]]:
 ; DEFAULT-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE7:.*]] ]
 ; DEFAULT-NEXT:    [[TMP3:%.*]] = getelementptr double, ptr [[SRC]], i64 [[INDEX]]
-; DEFAULT-NEXT:    [[TMP6:%.*]] = getelementptr double, ptr [[TMP3]], i64 2
+; DEFAULT-NEXT:    [[TMP6:%.*]] = getelementptr i8, ptr [[TMP3]], i64 16
 ; DEFAULT-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x double>, ptr [[TMP3]], align 8
 ; DEFAULT-NEXT:    [[WIDE_LOAD1:%.*]] = load <2 x double>, ptr [[TMP6]], align 8
 ; DEFAULT-NEXT:    [[TMP7:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[WIDE_LOAD]])
@@ -545,11 +545,14 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i16> poison, i16 [[TMP8]], i64 0
 ; DEFAULT-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i16> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer
 ; DEFAULT-NEXT:    [[TMP9:%.*]] = uitofp <vscale x 4 x i16> [[BROADCAST_SPLAT]] to <vscale x 4 x double>
+; DEFAULT-NEXT:    [[TMP16:%.*]] = shl nuw nsw i64 [[TMP11]], 3
 ; DEFAULT-NEXT:    [[TMP14:%.*]] = shl nuw nsw i64 [[TMP11]], 1
+; DEFAULT-NEXT:    [[TMP13:%.*]] = shl nuw nsw i64 [[TMP14]], 3
 ; DEFAULT-NEXT:    [[TMP17:%.*]] = mul nuw nsw i64 [[TMP11]], 3
-; DEFAULT-NEXT:    [[TMP12:%.*]] = getelementptr double, ptr [[NEXT_GEP1]], i64 [[TMP11]]
-; DEFAULT-NEXT:    [[TMP15:%.*]] = getelementptr double, ptr [[NEXT_GEP1]], i64 [[TMP14]]
-; DEFAULT-NEXT:    [[TMP18:%.*]] = getelementptr double, ptr [[NEXT_GEP1]], i64 [[TMP17]]
+; DEFAULT-NEXT:    [[TMP20:%.*]] = shl nuw nsw i64 [[TMP17]], 3
+; DEFAULT-NEXT:    [[TMP12:%.*]] = getelementptr i8, ptr [[NEXT_GEP1]], i64 [[TMP16]]
+; DEFAULT-NEXT:    [[TMP15:%.*]] = getelementptr i8, ptr [[NEXT_GEP1]], i64 [[TMP13]]
+; DEFAULT-NEXT:    [[TMP18:%.*]] = getelementptr i8, ptr [[NEXT_GEP1]], i64 [[TMP20]]
 ; DEFAULT-NEXT:    store <vscale x 4 x double> [[TMP9]], ptr [[NEXT_GEP1]], align 8
 ; DEFAULT-NEXT:    store <vscale x 4 x double> [[TMP9]], ptr [[TMP12]], align 8
 ; DEFAULT-NEXT:    store <vscale x 4 x double> [[TMP9]], ptr [[TMP15]], align 8
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
index f0664197dcb94..5d2bf22b946ec 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
@@ -287,7 +287,7 @@ define void @trunc_invariant_sdiv_result(i32 %a, i32 %b, ptr noalias %src, ptr %
 ; CHECK-NEXT:    [[TMP5:%.*]] = mul <16 x i16> [[TMP0]], [[TMP3]]
 ; CHECK-NEXT:    [[TMP6:%.*]] = mul <16 x i16> [[TMP0]], [[TMP4]]
 ; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr inbounds i16, ptr [[DST]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i16, ptr [[TMP7]], i64 16
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[TMP7]], i64 32
 ; CHECK-NEXT:    store <16 x i16> [[TMP5]], ptr [[TMP7]], align 2
 ; CHECK-NEXT:    store <16 x i16> [[TMP6]], ptr [[TMP8]], align 2
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
@@ -413,7 +413,7 @@ define void @old_and_new_size_equalko(ptr noalias %src, ptr noalias %dst) {
 ; CHECK:       [[VECTOR_BODY]]:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i32 [[INDEX]]
-; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[TMP0]], i64 4
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 32
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP0]], align 8
 ; CHECK-NEXT:    [[WIDE_LOAD1:%.*]] = load <4 x i64>, ptr [[TMP1]], align 8
 ; CHECK-NEXT:    [[TMP2:%.*]] = trunc <4 x i64> [[WIDE_LOAD]] to <4 x i1>
@@ -427,7 +427,7 @@ define void @old_and_new_size_equalko(ptr noalias %src, ptr noalias %dst) {
 ; CHECK-NEXT:    [[TMP10:%.*]] = trunc <4 x i64> [[TMP8]] to <4 x i32>
 ; CHECK-NEXT:    [[TMP11:%.*]] = trunc <4 x i64> [[TMP9]] to <4 x i32>
 ; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[DST]], i32 [[INDEX]]
-; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[TMP12]], i64 4
+; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[TMP12]], i64 16
 ; CHECK-NEXT:    store <4 x i32> [[TMP10]], ptr [[TMP12]], align 4
 ; CHECK-NEXT:    store <4 x i32> [[TMP11]], ptr [[TMP13]], align 4
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
index 5d550dc07ce4b..cc79e4b730364 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
@@ -37,7 +37,8 @@ define void @sdiv_feeding_gep(ptr %dst, i32 %x, i64 %M, i64 %conv6, i64 %N) {
 ; CHECK-NEXT:    [[TMP30:%.*]] = add i32 [[TMP28]], [[TMP26]]
 ; CHECK-NEXT:    [[TMP32:%.*]] = sext i32 [[TMP30]] to i64
 ; CHECK-NEXT:    [[TMP34:%.*]] = getelementptr double, ptr [[DST]], i64 [[TMP32]]
-; CHECK-NEXT:    [[TMP39:%.*]] = getelementptr double, ptr [[TMP34]], i64 [[TMP11]]
+; CHECK-NEXT:    [[TMP19:%.*]] = shl nuw nsw i64 [[TMP11]], 3
+; CHECK-NEXT:    [[TMP39:%.*]] = getelementptr i8, ptr [[TMP34]], i64 [[TMP19]]
 ; CHECK-NEXT:    store <vscale x 2 x double> zeroinitializer, ptr [[TMP34]], align 8
 ; CHECK-NEXT:    store <vscale x 2 x double> zeroinitializer, ptr [[TMP39]], align 8
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP9]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll
index 549df337e6907..ca59522a9ff35 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll
@@ -128,17 +128,17 @@ define void @add_i16(ptr noalias nocapture noundef writeonly %A, ptr nocapture n
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i16, ptr [[B:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i16, ptr [[TMP1]], i64 8
-; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i16, ptr [[TMP1]], i64 16
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i16, ptr [[TMP1]], i64 24
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 16
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 32
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 48
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <8 x i16>, ptr [[TMP1]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD2:%.*]] = load <8 x i16>, ptr [[TMP3]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD3:%.*]] = load <8 x i16>, ptr [[TMP4]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD4:%.*]] = load <8 x i16>, ptr [[TMP5]], align 1
 ; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr inbounds i16, ptr [[C:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i16, ptr [[TMP6]], i64 8
-; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i16, ptr [[TMP6]], i64 16
-; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i16, ptr [[TMP6]], i64 24
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 16
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 32
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 48
 ; CHECK-NEXT:    [[WIDE_LOAD5:%.*]] = load <8 x i16>, ptr [[TMP6]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD6:%.*]] = load <8 x i16>, ptr [[TMP8]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD7:%.*]] = load <8 x i16>, ptr [[TMP9]], align 1
@@ -148,9 +148,9 @@ define void @add_i16(ptr noalias nocapture noundef writeonly %A, ptr nocapture n
 ; CHECK-NEXT:    [[TMP13:%.*]] = add <8 x i16> [[WIDE_LOAD7]], [[WIDE_LOAD3]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = add <8 x i16> [[WIDE_LOAD8]], [[WIDE_LOAD4]]
 ; CHECK-NEXT:    [[TMP15:%.*]] = getelementptr inbounds i16, ptr [[A:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i16, ptr [[TMP15]], i64 8
-; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr inbounds i16, ptr [[TMP15]], i64 16
-; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i16, ptr [[TMP15]], i64 24
+; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 16
+; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 32
+; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 48
 ; CHECK-NEXT:    store <8 x i16> [[TMP11]], ptr [[TMP15]], align 1
 ; CHECK-NEXT:    store <8 x i16> [[TMP12]], ptr [[TMP17]], align 1
 ; CHECK-NEXT:    store <8 x i16> [[TMP13]], ptr [[TMP18]], align 1
@@ -237,17 +237,17 @@ define void @add_i32(ptr noalias nocapture noundef writeonly %A, ptr nocapture n
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[B:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 4
-; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 8
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 12
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 16
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 32
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 48
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP1]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP3]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD3:%.*]] = load <4 x i32>, ptr [[TMP4]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP5]], align 1
 ; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[C:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i64 4
-; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i64 8
-; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i64 12
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 16
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 32
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 48
 ; CHECK-NEXT:    [[WIDE_LOAD5:%.*]] = load <4 x i32>, ptr [[TMP6]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD6:%.*]] = load <4 x i32>, ptr [[TMP8]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD7:%.*]] = load <4 x i32>, ptr [[TMP9]], align 1
@@ -257,9 +257,9 @@ define void @add_i32(ptr noalias nocapture noundef writeonly %A, ptr nocapture n
 ; CHECK-NEXT:    [[TMP13:%.*]] = add <4 x i32> [[WIDE_LOAD7]], [[WIDE_LOAD3]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = add <4 x i32> [[WIDE_LOAD8]], [[WIDE_LOAD4]]
 ; CHECK-NEXT:    [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], i64 4
-; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], i64 8
-; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], i64 12
+; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 16
+; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 32
+; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 48
 ; CHECK-NEXT:    store <4 x i32> [[TMP11]], ptr [[TMP15]], align 1
 ; CHECK-NEXT:    store <4 x i32> [[TMP12]], ptr [[TMP17]], align 1
 ; CHECK-NEXT:    store <4 x i32> [[TMP13]], ptr [[TMP18]], align 1
@@ -447,9 +447,9 @@ define void @trip_count_based_on_ptrtoint(i64 %x) "target-cpu"="apple-m1" {
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP7:%.*]] = mul i64 [[INDEX]], 4
 ; CHECK-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[TMP7]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr i32, ptr [[NEXT_GEP]], i64 4
-; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr i32, ptr [[NEXT_GEP]], i64 8
-; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr i32, ptr [[NEXT_GEP]], i64 12
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i64 16
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i64 32
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i64 48
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[NEXT_GEP]], align 4
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[TMP8]], align 4
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[TMP9]], align 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
index 85726c161cc54..d05a6c35f39c7 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
@@ -117,7 +117,7 @@ define void @test_widen_induction(ptr %A, i64 %N) {
 ; CHECK-NEXT:    [[VEC_IND:%.*]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[STEP_ADD:%.*]] = add <2 x i64> [[VEC_IND]], splat (i64 2)
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 16
 ; CHECK-NEXT:    store <2 x i64> [[VEC_IND]], ptr [[TMP1]], align 4
 ; CHECK-NEXT:    store <2 x i64> [[STEP_ADD]], ptr [[TMP3]], align 4
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
@@ -201,7 +201,7 @@ define void @test_widen_induction_variable_start(ptr %A, i64 %N, i64 %start) {
 ; CHECK-NEXT:    [[STEP_ADD:%.*]] = add <2 x i64> [[VEC_IND]], splat (i64 2)
 ; CHECK-NEXT:    [[OFFSET_IDX:%.*]] = add i64 [[START]], [[INDEX]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[OFFSET_IDX]]
-; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i64 2
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 16
 ; CHECK-NEXT:    store <2 x i64> [[VEC_IND]], ptr [[TMP2]], align 4
 ; CHECK-NEXT:    store <2 x i64> [[STEP_ADD]], ptr [[TMP4]], align 4
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
@@ -285,7 +285,7 @@ define void @test_widen_induction_step_2(ptr %A, i64 %N, i32 %step) {
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = add <2 x i64> [[VEC_IND]], splat (i64 10)
 ; CHECK-NEXT:    [[TMP3:%.*]] = add <2 x i64> [[STEP_ADD]], splat (i64 10)
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 16
 ; CHECK-NEXT:    store <2 x i64> [[TMP2]], ptr [[TMP1]], align 4
 ; CHECK-NEXT:    store <2 x i64> [[TMP3]], ptr [[TMP5]], align 4
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll b/llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll
index feb0175e75542..9045bb7b070d6 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll
@@ -21,16 +21,16 @@ define double @fp128_fmuladd_reduction(ptr %start0, ptr %start1, ptr %end0, ptr
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr i8, ptr [[START0]], i64 [[TMP0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = mul i64 [[INDEX]], 8
 ; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr i8, ptr [[START1]], i64 [[TMP2]]
-; CHECK-N...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2026

@llvm/pr-subscribers-backend-risc-v

Author: Mel Chen (Mel-Chen)

Changes

Currently, VPVectorPointerRecipe has the form: vector-pointer %ptr, (%offset).
This makes it difficult to add a stride operand to VPVectorPointerRecipe for #147297.
One approach to solve this is to convert VPVectorPointerRecipe into an abstract recipe with the form vector-pointer %ptr, %stride_in_bytes, and revert to handle the unroll part by VPUnrollPartAccessor apporach. The conversion of each VPVectorPointerRecipe to Mul + PtrAdd would only happen during VPlanTransforms::convertToConcreteRecipes. As a first step, this patch changes the emitted IR from typed GEP to byte GEP. This will also help with future support for unaligned strided accesses.


Patch is 596.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/174934.diff

121 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+1-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp (+7-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+7-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll (+21-21)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fixed-wide-lane-mask.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fmax-without-fast-math-flags.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fmin-without-fast-math-flags.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/gather-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll (+10-7)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/intrinsiccost.ll (+2-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/licm-calls.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/load-cast-context.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll (+15-10)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce.ll (+5-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/pr151664-cost-hoisted-vector-scalable.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll (+4-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/replicating-load-store-costs.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+317-290)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/select-index.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-strict-reductions.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+20-12)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-fneg.ll (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll (+4-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-live-out-pointer-induction.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-multiexit.ll (+6-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-predicated-costs.ll (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll (+5-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll (+15-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-vscale-based-trip-counts.ll (+15-10)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-wide-lane-mask.ll (+12-8)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+48-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-unroll.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/uniform-args-call-variants.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/LoongArch/defaults.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/optimal-epilog-vectorization.ll (+56-56)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/reductions.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+13-7)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-inloop-reduction.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cast-costs.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/conversion-cost.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/float-induction-x86.ll (+19-19)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/fminimumnum.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/gather-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/induction-step.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/intrinsiccost.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/iv-live-outs.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+33-33)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked-store-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+117-117)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr23997.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr35432.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr47437.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/reduction-crash.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/uniform_load.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-force-tail-with-evl.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/widened-value-used-as-scalar-and-first-lane.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/assume.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/cse-casts.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/cse-gep-source-element-type.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/expand-scev-after-invoke.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/fcmp-uno-fold-interleave.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-dead-instructions.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/fmax-without-fast-math-flags-interleave.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/induction.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-nested-loop.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-trunc.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp.ll (+39-39)
  • (modified) llvm/test/Transforms/LoopVectorize/metadata.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/minimumnum-maximumnum-reductions.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/narrow-to-single-scalar.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/nested-loops-scev-expansion.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/noalias-scope-decl.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+44-44)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-odd-interleave-counts.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-assume.ll (+9-6)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll (+15-9)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-inductions.ll (+6-4)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-iv-outside-user.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/scalar_after_vectorization.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/select-cmp-multiuse.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/select-cmp.ll (+15-15)
  • (modified) llvm/test/Transforms/LoopVectorize/select-index-interleaving.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/single-early-exit-interleave-hint.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/single-early-exit-interleave.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/struct-return-replicate.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-outside-iv-users.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-pointer-gep-idxty-addrspace.ll (+6-6)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index b9cd322d9ec69..60365f3ca65c4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2595,7 +2595,7 @@ void VPVectorPointerRecipe::execute(VPTransformState &State) {
          "Expected prior simplification of recipe without offset");
   Value *Ptr = State.get(getOperand(0), VPLane(0));
   Value *Offset = State.get(getOffset(), true);
-  Value *ResultPtr = Builder.CreateGEP(getSourceElementType(), Ptr, Offset, "",
+  Value *ResultPtr = Builder.CreateGEP(Builder.getInt8Ty(), Ptr, Offset, "",
                                        getGEPNoWrapFlags());
   State.set(this, ResultPtr, /*IsScalar*/ true);
 }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
index be2a68ca40b93..5c8f7557ef4fc 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
@@ -309,8 +309,14 @@ void UnrollState::unrollRecipeByUF(VPRecipeBase &R) {
       VPValue *VFxPart = Builder.createOverflowingOp(
           Instruction::Mul, {VF, Plan.getConstantInt(IndexTy, Part)},
           {true, true});
+      VPValue *Offset = Builder.createOverflowingOp(
+          Instruction::Mul,
+          {VFxPart,
+           Plan.getConstantInt(
+               IndexTy, DL.getTypeAllocSize(VPR->getSourceElementType()))},
+          {true, true});
       Copy->setOperand(0, VPR->getOperand(0));
-      Copy->addOperand(VFxPart);
+      Copy->addOperand(Offset);
       continue;
     }
     if (auto *Red = dyn_cast<VPReductionRecipe>(&R)) {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
index 95b4dcb23dd47..c1e29dd9d04f9 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
@@ -14,7 +14,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[VECTOR_RECUR:%.*]] = phi <2 x i64> [ <i64 poison, i64 0>, %[[VECTOR_PH]] ], [ [[WIDE_LOAD1:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i64 2
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 16
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x i64>, ptr [[TMP2]], align 8
 ; CHECK-NEXT:    [[WIDE_LOAD1]] = load <2 x i64>, ptr [[TMP5]], align 8
 ; CHECK-NEXT:    [[TMP6:%.*]] = shufflevector <2 x i64> [[VECTOR_RECUR]], <2 x i64> [[WIDE_LOAD]], <2 x i32> <i32 1, i32 2>
@@ -22,7 +22,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    [[TMP8:%.*]] = call <2 x i64> @llvm.fshl.v2i64(<2 x i64> splat (i64 1), <2 x i64> [[TMP6]], <2 x i64> splat (i64 1))
 ; CHECK-NEXT:    [[TMP9:%.*]] = call <2 x i64> @llvm.fshl.v2i64(<2 x i64> splat (i64 1), <2 x i64> [[TMP7]], <2 x i64> splat (i64 1))
 ; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[DST]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[TMP10]], i64 2
+; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[TMP10]], i64 16
 ; CHECK-NEXT:    store <2 x i64> [[TMP8]], ptr [[TMP10]], align 8
 ; CHECK-NEXT:    store <2 x i64> [[TMP9]], ptr [[TMP13]], align 8
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index d7d77cb4325d4..d4b59e6839999 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -63,7 +63,7 @@ define void @loop_dependent_cond(ptr %src, ptr noalias %dst, i64 %N) {
 ; DEFAULT:       [[VECTOR_BODY]]:
 ; DEFAULT-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE7:.*]] ]
 ; DEFAULT-NEXT:    [[TMP3:%.*]] = getelementptr double, ptr [[SRC]], i64 [[INDEX]]
-; DEFAULT-NEXT:    [[TMP6:%.*]] = getelementptr double, ptr [[TMP3]], i64 2
+; DEFAULT-NEXT:    [[TMP6:%.*]] = getelementptr i8, ptr [[TMP3]], i64 16
 ; DEFAULT-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x double>, ptr [[TMP3]], align 8
 ; DEFAULT-NEXT:    [[WIDE_LOAD1:%.*]] = load <2 x double>, ptr [[TMP6]], align 8
 ; DEFAULT-NEXT:    [[TMP7:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[WIDE_LOAD]])
@@ -545,11 +545,14 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i16> poison, i16 [[TMP8]], i64 0
 ; DEFAULT-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i16> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer
 ; DEFAULT-NEXT:    [[TMP9:%.*]] = uitofp <vscale x 4 x i16> [[BROADCAST_SPLAT]] to <vscale x 4 x double>
+; DEFAULT-NEXT:    [[TMP16:%.*]] = shl nuw nsw i64 [[TMP11]], 3
 ; DEFAULT-NEXT:    [[TMP14:%.*]] = shl nuw nsw i64 [[TMP11]], 1
+; DEFAULT-NEXT:    [[TMP13:%.*]] = shl nuw nsw i64 [[TMP14]], 3
 ; DEFAULT-NEXT:    [[TMP17:%.*]] = mul nuw nsw i64 [[TMP11]], 3
-; DEFAULT-NEXT:    [[TMP12:%.*]] = getelementptr double, ptr [[NEXT_GEP1]], i64 [[TMP11]]
-; DEFAULT-NEXT:    [[TMP15:%.*]] = getelementptr double, ptr [[NEXT_GEP1]], i64 [[TMP14]]
-; DEFAULT-NEXT:    [[TMP18:%.*]] = getelementptr double, ptr [[NEXT_GEP1]], i64 [[TMP17]]
+; DEFAULT-NEXT:    [[TMP20:%.*]] = shl nuw nsw i64 [[TMP17]], 3
+; DEFAULT-NEXT:    [[TMP12:%.*]] = getelementptr i8, ptr [[NEXT_GEP1]], i64 [[TMP16]]
+; DEFAULT-NEXT:    [[TMP15:%.*]] = getelementptr i8, ptr [[NEXT_GEP1]], i64 [[TMP13]]
+; DEFAULT-NEXT:    [[TMP18:%.*]] = getelementptr i8, ptr [[NEXT_GEP1]], i64 [[TMP20]]
 ; DEFAULT-NEXT:    store <vscale x 4 x double> [[TMP9]], ptr [[NEXT_GEP1]], align 8
 ; DEFAULT-NEXT:    store <vscale x 4 x double> [[TMP9]], ptr [[TMP12]], align 8
 ; DEFAULT-NEXT:    store <vscale x 4 x double> [[TMP9]], ptr [[TMP15]], align 8
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
index f0664197dcb94..5d2bf22b946ec 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
@@ -287,7 +287,7 @@ define void @trunc_invariant_sdiv_result(i32 %a, i32 %b, ptr noalias %src, ptr %
 ; CHECK-NEXT:    [[TMP5:%.*]] = mul <16 x i16> [[TMP0]], [[TMP3]]
 ; CHECK-NEXT:    [[TMP6:%.*]] = mul <16 x i16> [[TMP0]], [[TMP4]]
 ; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr inbounds i16, ptr [[DST]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i16, ptr [[TMP7]], i64 16
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[TMP7]], i64 32
 ; CHECK-NEXT:    store <16 x i16> [[TMP5]], ptr [[TMP7]], align 2
 ; CHECK-NEXT:    store <16 x i16> [[TMP6]], ptr [[TMP8]], align 2
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
@@ -413,7 +413,7 @@ define void @old_and_new_size_equalko(ptr noalias %src, ptr noalias %dst) {
 ; CHECK:       [[VECTOR_BODY]]:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i32 [[INDEX]]
-; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[TMP0]], i64 4
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 32
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP0]], align 8
 ; CHECK-NEXT:    [[WIDE_LOAD1:%.*]] = load <4 x i64>, ptr [[TMP1]], align 8
 ; CHECK-NEXT:    [[TMP2:%.*]] = trunc <4 x i64> [[WIDE_LOAD]] to <4 x i1>
@@ -427,7 +427,7 @@ define void @old_and_new_size_equalko(ptr noalias %src, ptr noalias %dst) {
 ; CHECK-NEXT:    [[TMP10:%.*]] = trunc <4 x i64> [[TMP8]] to <4 x i32>
 ; CHECK-NEXT:    [[TMP11:%.*]] = trunc <4 x i64> [[TMP9]] to <4 x i32>
 ; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[DST]], i32 [[INDEX]]
-; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[TMP12]], i64 4
+; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[TMP12]], i64 16
 ; CHECK-NEXT:    store <4 x i32> [[TMP10]], ptr [[TMP12]], align 4
 ; CHECK-NEXT:    store <4 x i32> [[TMP11]], ptr [[TMP13]], align 4
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
index 5d550dc07ce4b..cc79e4b730364 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
@@ -37,7 +37,8 @@ define void @sdiv_feeding_gep(ptr %dst, i32 %x, i64 %M, i64 %conv6, i64 %N) {
 ; CHECK-NEXT:    [[TMP30:%.*]] = add i32 [[TMP28]], [[TMP26]]
 ; CHECK-NEXT:    [[TMP32:%.*]] = sext i32 [[TMP30]] to i64
 ; CHECK-NEXT:    [[TMP34:%.*]] = getelementptr double, ptr [[DST]], i64 [[TMP32]]
-; CHECK-NEXT:    [[TMP39:%.*]] = getelementptr double, ptr [[TMP34]], i64 [[TMP11]]
+; CHECK-NEXT:    [[TMP19:%.*]] = shl nuw nsw i64 [[TMP11]], 3
+; CHECK-NEXT:    [[TMP39:%.*]] = getelementptr i8, ptr [[TMP34]], i64 [[TMP19]]
 ; CHECK-NEXT:    store <vscale x 2 x double> zeroinitializer, ptr [[TMP34]], align 8
 ; CHECK-NEXT:    store <vscale x 2 x double> zeroinitializer, ptr [[TMP39]], align 8
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP9]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll
index 549df337e6907..ca59522a9ff35 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll
@@ -128,17 +128,17 @@ define void @add_i16(ptr noalias nocapture noundef writeonly %A, ptr nocapture n
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i16, ptr [[B:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i16, ptr [[TMP1]], i64 8
-; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i16, ptr [[TMP1]], i64 16
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i16, ptr [[TMP1]], i64 24
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 16
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 32
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 48
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <8 x i16>, ptr [[TMP1]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD2:%.*]] = load <8 x i16>, ptr [[TMP3]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD3:%.*]] = load <8 x i16>, ptr [[TMP4]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD4:%.*]] = load <8 x i16>, ptr [[TMP5]], align 1
 ; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr inbounds i16, ptr [[C:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i16, ptr [[TMP6]], i64 8
-; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i16, ptr [[TMP6]], i64 16
-; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i16, ptr [[TMP6]], i64 24
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 16
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 32
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 48
 ; CHECK-NEXT:    [[WIDE_LOAD5:%.*]] = load <8 x i16>, ptr [[TMP6]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD6:%.*]] = load <8 x i16>, ptr [[TMP8]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD7:%.*]] = load <8 x i16>, ptr [[TMP9]], align 1
@@ -148,9 +148,9 @@ define void @add_i16(ptr noalias nocapture noundef writeonly %A, ptr nocapture n
 ; CHECK-NEXT:    [[TMP13:%.*]] = add <8 x i16> [[WIDE_LOAD7]], [[WIDE_LOAD3]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = add <8 x i16> [[WIDE_LOAD8]], [[WIDE_LOAD4]]
 ; CHECK-NEXT:    [[TMP15:%.*]] = getelementptr inbounds i16, ptr [[A:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i16, ptr [[TMP15]], i64 8
-; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr inbounds i16, ptr [[TMP15]], i64 16
-; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i16, ptr [[TMP15]], i64 24
+; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 16
+; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 32
+; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 48
 ; CHECK-NEXT:    store <8 x i16> [[TMP11]], ptr [[TMP15]], align 1
 ; CHECK-NEXT:    store <8 x i16> [[TMP12]], ptr [[TMP17]], align 1
 ; CHECK-NEXT:    store <8 x i16> [[TMP13]], ptr [[TMP18]], align 1
@@ -237,17 +237,17 @@ define void @add_i32(ptr noalias nocapture noundef writeonly %A, ptr nocapture n
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[B:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 4
-; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 8
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 12
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 16
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 32
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 48
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP1]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP3]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD3:%.*]] = load <4 x i32>, ptr [[TMP4]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP5]], align 1
 ; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[C:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i64 4
-; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i64 8
-; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i64 12
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 16
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 32
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 48
 ; CHECK-NEXT:    [[WIDE_LOAD5:%.*]] = load <4 x i32>, ptr [[TMP6]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD6:%.*]] = load <4 x i32>, ptr [[TMP8]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD7:%.*]] = load <4 x i32>, ptr [[TMP9]], align 1
@@ -257,9 +257,9 @@ define void @add_i32(ptr noalias nocapture noundef writeonly %A, ptr nocapture n
 ; CHECK-NEXT:    [[TMP13:%.*]] = add <4 x i32> [[WIDE_LOAD7]], [[WIDE_LOAD3]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = add <4 x i32> [[WIDE_LOAD8]], [[WIDE_LOAD4]]
 ; CHECK-NEXT:    [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], i64 4
-; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], i64 8
-; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], i64 12
+; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 16
+; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 32
+; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 48
 ; CHECK-NEXT:    store <4 x i32> [[TMP11]], ptr [[TMP15]], align 1
 ; CHECK-NEXT:    store <4 x i32> [[TMP12]], ptr [[TMP17]], align 1
 ; CHECK-NEXT:    store <4 x i32> [[TMP13]], ptr [[TMP18]], align 1
@@ -447,9 +447,9 @@ define void @trip_count_based_on_ptrtoint(i64 %x) "target-cpu"="apple-m1" {
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP7:%.*]] = mul i64 [[INDEX]], 4
 ; CHECK-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[TMP7]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr i32, ptr [[NEXT_GEP]], i64 4
-; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr i32, ptr [[NEXT_GEP]], i64 8
-; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr i32, ptr [[NEXT_GEP]], i64 12
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i64 16
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i64 32
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i64 48
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[NEXT_GEP]], align 4
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[TMP8]], align 4
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[TMP9]], align 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
index 85726c161cc54..d05a6c35f39c7 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
@@ -117,7 +117,7 @@ define void @test_widen_induction(ptr %A, i64 %N) {
 ; CHECK-NEXT:    [[VEC_IND:%.*]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[STEP_ADD:%.*]] = add <2 x i64> [[VEC_IND]], splat (i64 2)
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 16
 ; CHECK-NEXT:    store <2 x i64> [[VEC_IND]], ptr [[TMP1]], align 4
 ; CHECK-NEXT:    store <2 x i64> [[STEP_ADD]], ptr [[TMP3]], align 4
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
@@ -201,7 +201,7 @@ define void @test_widen_induction_variable_start(ptr %A, i64 %N, i64 %start) {
 ; CHECK-NEXT:    [[STEP_ADD:%.*]] = add <2 x i64> [[VEC_IND]], splat (i64 2)
 ; CHECK-NEXT:    [[OFFSET_IDX:%.*]] = add i64 [[START]], [[INDEX]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[OFFSET_IDX]]
-; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i64 2
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 16
 ; CHECK-NEXT:    store <2 x i64> [[VEC_IND]], ptr [[TMP2]], align 4
 ; CHECK-NEXT:    store <2 x i64> [[STEP_ADD]], ptr [[TMP4]], align 4
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
@@ -285,7 +285,7 @@ define void @test_widen_induction_step_2(ptr %A, i64 %N, i32 %step) {
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = add <2 x i64> [[VEC_IND]], splat (i64 10)
 ; CHECK-NEXT:    [[TMP3:%.*]] = add <2 x i64> [[STEP_ADD]], splat (i64 10)
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 16
 ; CHECK-NEXT:    store <2 x i64> [[TMP2]], ptr [[TMP1]], align 4
 ; CHECK-NEXT:    store <2 x i64> [[TMP3]], ptr [[TMP5]], align 4
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll b/llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll
index feb0175e75542..9045bb7b070d6 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll
@@ -21,16 +21,16 @@ define double @fp128_fmuladd_reduction(ptr %start0, ptr %start1, ptr %end0, ptr
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr i8, ptr [[START0]], i64 [[TMP0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = mul i64 [[INDEX]], 8
 ; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr i8, ptr [[START1]], i64 [[TMP2]]
-; CHECK-N...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2026

@llvm/pr-subscribers-vectorizers

Author: Mel Chen (Mel-Chen)

Changes

Currently, VPVectorPointerRecipe has the form: vector-pointer %ptr, (%offset).
This makes it difficult to add a stride operand to VPVectorPointerRecipe for #147297.
One approach to solve this is to convert VPVectorPointerRecipe into an abstract recipe with the form vector-pointer %ptr, %stride_in_bytes, and revert to handle the unroll part by VPUnrollPartAccessor apporach. The conversion of each VPVectorPointerRecipe to Mul + PtrAdd would only happen during VPlanTransforms::convertToConcreteRecipes. As a first step, this patch changes the emitted IR from typed GEP to byte GEP. This will also help with future support for unaligned strided accesses.


Patch is 596.89 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/174934.diff

121 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+1-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp (+7-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+7-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll (+21-21)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fixed-wide-lane-mask.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fmax-without-fast-math-flags.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fmin-without-fast-math-flags.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/gather-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll (+10-7)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/intrinsiccost.ll (+2-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/licm-calls.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/load-cast-context.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll (+15-10)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce.ll (+5-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/pr151664-cost-hoisted-vector-scalable.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll (+4-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/replicating-load-store-costs.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+317-290)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/select-index.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-strict-reductions.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+20-12)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-fneg.ll (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll (+4-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-live-out-pointer-induction.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-multiexit.ll (+6-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-predicated-costs.ll (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll (+5-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll (+15-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-vscale-based-trip-counts.ll (+15-10)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-wide-lane-mask.ll (+12-8)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+48-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-unroll.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/uniform-args-call-variants.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/LoongArch/defaults.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/optimal-epilog-vectorization.ll (+56-56)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/reductions.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+13-7)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-inloop-reduction.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/tail-folding-reduction.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cast-costs.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/conversion-cost.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/float-induction-x86.ll (+19-19)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/fminimumnum.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/gather-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/induction-step.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/intrinsiccost.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/iv-live-outs.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+33-33)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked-store-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+117-117)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr23997.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr35432.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr47437.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/reduction-crash.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/uniform_load.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-force-tail-with-evl.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/widened-value-used-as-scalar-and-first-lane.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/assume.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/cse-casts.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/cse-gep-source-element-type.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/expand-scev-after-invoke.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/fcmp-uno-fold-interleave.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-dead-instructions.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/fmax-without-fast-math-flags-interleave.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/induction.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-nested-loop.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-trunc.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp.ll (+39-39)
  • (modified) llvm/test/Transforms/LoopVectorize/metadata.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/minimumnum-maximumnum-reductions.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/narrow-to-single-scalar.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/nested-loops-scev-expansion.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/noalias-scope-decl.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+44-44)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-odd-interleave-counts.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-assume.ll (+9-6)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-first-order-recurrence.ll (+15-9)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-inductions.ll (+6-4)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-iv-outside-user.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll (+3-2)
  • (modified) llvm/test/Transforms/LoopVectorize/scalar_after_vectorization.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/select-cmp-multiuse.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/select-cmp.ll (+15-15)
  • (modified) llvm/test/Transforms/LoopVectorize/select-index-interleaving.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/single-early-exit-interleave-hint.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/single-early-exit-interleave.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/struct-return-replicate.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-outside-iv-users.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-pointer-gep-idxty-addrspace.ll (+6-6)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index b9cd322d9ec69..60365f3ca65c4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2595,7 +2595,7 @@ void VPVectorPointerRecipe::execute(VPTransformState &State) {
          "Expected prior simplification of recipe without offset");
   Value *Ptr = State.get(getOperand(0), VPLane(0));
   Value *Offset = State.get(getOffset(), true);
-  Value *ResultPtr = Builder.CreateGEP(getSourceElementType(), Ptr, Offset, "",
+  Value *ResultPtr = Builder.CreateGEP(Builder.getInt8Ty(), Ptr, Offset, "",
                                        getGEPNoWrapFlags());
   State.set(this, ResultPtr, /*IsScalar*/ true);
 }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
index be2a68ca40b93..5c8f7557ef4fc 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
@@ -309,8 +309,14 @@ void UnrollState::unrollRecipeByUF(VPRecipeBase &R) {
       VPValue *VFxPart = Builder.createOverflowingOp(
           Instruction::Mul, {VF, Plan.getConstantInt(IndexTy, Part)},
           {true, true});
+      VPValue *Offset = Builder.createOverflowingOp(
+          Instruction::Mul,
+          {VFxPart,
+           Plan.getConstantInt(
+               IndexTy, DL.getTypeAllocSize(VPR->getSourceElementType()))},
+          {true, true});
       Copy->setOperand(0, VPR->getOperand(0));
-      Copy->addOperand(VFxPart);
+      Copy->addOperand(Offset);
       continue;
     }
     if (auto *Red = dyn_cast<VPReductionRecipe>(&R)) {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
index 95b4dcb23dd47..c1e29dd9d04f9 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
@@ -14,7 +14,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[VECTOR_RECUR:%.*]] = phi <2 x i64> [ <i64 poison, i64 0>, %[[VECTOR_PH]] ], [ [[WIDE_LOAD1:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i64 2
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 16
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x i64>, ptr [[TMP2]], align 8
 ; CHECK-NEXT:    [[WIDE_LOAD1]] = load <2 x i64>, ptr [[TMP5]], align 8
 ; CHECK-NEXT:    [[TMP6:%.*]] = shufflevector <2 x i64> [[VECTOR_RECUR]], <2 x i64> [[WIDE_LOAD]], <2 x i32> <i32 1, i32 2>
@@ -22,7 +22,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    [[TMP8:%.*]] = call <2 x i64> @llvm.fshl.v2i64(<2 x i64> splat (i64 1), <2 x i64> [[TMP6]], <2 x i64> splat (i64 1))
 ; CHECK-NEXT:    [[TMP9:%.*]] = call <2 x i64> @llvm.fshl.v2i64(<2 x i64> splat (i64 1), <2 x i64> [[TMP7]], <2 x i64> splat (i64 1))
 ; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[DST]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[TMP10]], i64 2
+; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[TMP10]], i64 16
 ; CHECK-NEXT:    store <2 x i64> [[TMP8]], ptr [[TMP10]], align 8
 ; CHECK-NEXT:    store <2 x i64> [[TMP9]], ptr [[TMP13]], align 8
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index d7d77cb4325d4..d4b59e6839999 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -63,7 +63,7 @@ define void @loop_dependent_cond(ptr %src, ptr noalias %dst, i64 %N) {
 ; DEFAULT:       [[VECTOR_BODY]]:
 ; DEFAULT-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE7:.*]] ]
 ; DEFAULT-NEXT:    [[TMP3:%.*]] = getelementptr double, ptr [[SRC]], i64 [[INDEX]]
-; DEFAULT-NEXT:    [[TMP6:%.*]] = getelementptr double, ptr [[TMP3]], i64 2
+; DEFAULT-NEXT:    [[TMP6:%.*]] = getelementptr i8, ptr [[TMP3]], i64 16
 ; DEFAULT-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x double>, ptr [[TMP3]], align 8
 ; DEFAULT-NEXT:    [[WIDE_LOAD1:%.*]] = load <2 x double>, ptr [[TMP6]], align 8
 ; DEFAULT-NEXT:    [[TMP7:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[WIDE_LOAD]])
@@ -545,11 +545,14 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i16> poison, i16 [[TMP8]], i64 0
 ; DEFAULT-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i16> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer
 ; DEFAULT-NEXT:    [[TMP9:%.*]] = uitofp <vscale x 4 x i16> [[BROADCAST_SPLAT]] to <vscale x 4 x double>
+; DEFAULT-NEXT:    [[TMP16:%.*]] = shl nuw nsw i64 [[TMP11]], 3
 ; DEFAULT-NEXT:    [[TMP14:%.*]] = shl nuw nsw i64 [[TMP11]], 1
+; DEFAULT-NEXT:    [[TMP13:%.*]] = shl nuw nsw i64 [[TMP14]], 3
 ; DEFAULT-NEXT:    [[TMP17:%.*]] = mul nuw nsw i64 [[TMP11]], 3
-; DEFAULT-NEXT:    [[TMP12:%.*]] = getelementptr double, ptr [[NEXT_GEP1]], i64 [[TMP11]]
-; DEFAULT-NEXT:    [[TMP15:%.*]] = getelementptr double, ptr [[NEXT_GEP1]], i64 [[TMP14]]
-; DEFAULT-NEXT:    [[TMP18:%.*]] = getelementptr double, ptr [[NEXT_GEP1]], i64 [[TMP17]]
+; DEFAULT-NEXT:    [[TMP20:%.*]] = shl nuw nsw i64 [[TMP17]], 3
+; DEFAULT-NEXT:    [[TMP12:%.*]] = getelementptr i8, ptr [[NEXT_GEP1]], i64 [[TMP16]]
+; DEFAULT-NEXT:    [[TMP15:%.*]] = getelementptr i8, ptr [[NEXT_GEP1]], i64 [[TMP13]]
+; DEFAULT-NEXT:    [[TMP18:%.*]] = getelementptr i8, ptr [[NEXT_GEP1]], i64 [[TMP20]]
 ; DEFAULT-NEXT:    store <vscale x 4 x double> [[TMP9]], ptr [[NEXT_GEP1]], align 8
 ; DEFAULT-NEXT:    store <vscale x 4 x double> [[TMP9]], ptr [[TMP12]], align 8
 ; DEFAULT-NEXT:    store <vscale x 4 x double> [[TMP9]], ptr [[TMP15]], align 8
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
index f0664197dcb94..5d2bf22b946ec 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
@@ -287,7 +287,7 @@ define void @trunc_invariant_sdiv_result(i32 %a, i32 %b, ptr noalias %src, ptr %
 ; CHECK-NEXT:    [[TMP5:%.*]] = mul <16 x i16> [[TMP0]], [[TMP3]]
 ; CHECK-NEXT:    [[TMP6:%.*]] = mul <16 x i16> [[TMP0]], [[TMP4]]
 ; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr inbounds i16, ptr [[DST]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i16, ptr [[TMP7]], i64 16
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[TMP7]], i64 32
 ; CHECK-NEXT:    store <16 x i16> [[TMP5]], ptr [[TMP7]], align 2
 ; CHECK-NEXT:    store <16 x i16> [[TMP6]], ptr [[TMP8]], align 2
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
@@ -413,7 +413,7 @@ define void @old_and_new_size_equalko(ptr noalias %src, ptr noalias %dst) {
 ; CHECK:       [[VECTOR_BODY]]:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i32 [[INDEX]]
-; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[TMP0]], i64 4
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 32
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i64>, ptr [[TMP0]], align 8
 ; CHECK-NEXT:    [[WIDE_LOAD1:%.*]] = load <4 x i64>, ptr [[TMP1]], align 8
 ; CHECK-NEXT:    [[TMP2:%.*]] = trunc <4 x i64> [[WIDE_LOAD]] to <4 x i1>
@@ -427,7 +427,7 @@ define void @old_and_new_size_equalko(ptr noalias %src, ptr noalias %dst) {
 ; CHECK-NEXT:    [[TMP10:%.*]] = trunc <4 x i64> [[TMP8]] to <4 x i32>
 ; CHECK-NEXT:    [[TMP11:%.*]] = trunc <4 x i64> [[TMP9]] to <4 x i32>
 ; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[DST]], i32 [[INDEX]]
-; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[TMP12]], i64 4
+; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[TMP12]], i64 16
 ; CHECK-NEXT:    store <4 x i32> [[TMP10]], ptr [[TMP12]], align 4
 ; CHECK-NEXT:    store <4 x i32> [[TMP11]], ptr [[TMP13]], align 4
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
index 5d550dc07ce4b..cc79e4b730364 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
@@ -37,7 +37,8 @@ define void @sdiv_feeding_gep(ptr %dst, i32 %x, i64 %M, i64 %conv6, i64 %N) {
 ; CHECK-NEXT:    [[TMP30:%.*]] = add i32 [[TMP28]], [[TMP26]]
 ; CHECK-NEXT:    [[TMP32:%.*]] = sext i32 [[TMP30]] to i64
 ; CHECK-NEXT:    [[TMP34:%.*]] = getelementptr double, ptr [[DST]], i64 [[TMP32]]
-; CHECK-NEXT:    [[TMP39:%.*]] = getelementptr double, ptr [[TMP34]], i64 [[TMP11]]
+; CHECK-NEXT:    [[TMP19:%.*]] = shl nuw nsw i64 [[TMP11]], 3
+; CHECK-NEXT:    [[TMP39:%.*]] = getelementptr i8, ptr [[TMP34]], i64 [[TMP19]]
 ; CHECK-NEXT:    store <vscale x 2 x double> zeroinitializer, ptr [[TMP34]], align 8
 ; CHECK-NEXT:    store <vscale x 2 x double> zeroinitializer, ptr [[TMP39]], align 8
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP9]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll
index 549df337e6907..ca59522a9ff35 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll
@@ -128,17 +128,17 @@ define void @add_i16(ptr noalias nocapture noundef writeonly %A, ptr nocapture n
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i16, ptr [[B:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i16, ptr [[TMP1]], i64 8
-; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i16, ptr [[TMP1]], i64 16
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i16, ptr [[TMP1]], i64 24
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 16
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 32
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 48
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <8 x i16>, ptr [[TMP1]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD2:%.*]] = load <8 x i16>, ptr [[TMP3]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD3:%.*]] = load <8 x i16>, ptr [[TMP4]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD4:%.*]] = load <8 x i16>, ptr [[TMP5]], align 1
 ; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr inbounds i16, ptr [[C:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i16, ptr [[TMP6]], i64 8
-; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i16, ptr [[TMP6]], i64 16
-; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i16, ptr [[TMP6]], i64 24
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 16
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 32
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 48
 ; CHECK-NEXT:    [[WIDE_LOAD5:%.*]] = load <8 x i16>, ptr [[TMP6]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD6:%.*]] = load <8 x i16>, ptr [[TMP8]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD7:%.*]] = load <8 x i16>, ptr [[TMP9]], align 1
@@ -148,9 +148,9 @@ define void @add_i16(ptr noalias nocapture noundef writeonly %A, ptr nocapture n
 ; CHECK-NEXT:    [[TMP13:%.*]] = add <8 x i16> [[WIDE_LOAD7]], [[WIDE_LOAD3]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = add <8 x i16> [[WIDE_LOAD8]], [[WIDE_LOAD4]]
 ; CHECK-NEXT:    [[TMP15:%.*]] = getelementptr inbounds i16, ptr [[A:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i16, ptr [[TMP15]], i64 8
-; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr inbounds i16, ptr [[TMP15]], i64 16
-; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i16, ptr [[TMP15]], i64 24
+; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 16
+; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 32
+; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 48
 ; CHECK-NEXT:    store <8 x i16> [[TMP11]], ptr [[TMP15]], align 1
 ; CHECK-NEXT:    store <8 x i16> [[TMP12]], ptr [[TMP17]], align 1
 ; CHECK-NEXT:    store <8 x i16> [[TMP13]], ptr [[TMP18]], align 1
@@ -237,17 +237,17 @@ define void @add_i32(ptr noalias nocapture noundef writeonly %A, ptr nocapture n
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[B:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 4
-; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 8
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 12
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 16
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 32
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 48
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP1]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD2:%.*]] = load <4 x i32>, ptr [[TMP3]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD3:%.*]] = load <4 x i32>, ptr [[TMP4]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD4:%.*]] = load <4 x i32>, ptr [[TMP5]], align 1
 ; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[C:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i64 4
-; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i64 8
-; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i64 12
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 16
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 32
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[TMP6]], i64 48
 ; CHECK-NEXT:    [[WIDE_LOAD5:%.*]] = load <4 x i32>, ptr [[TMP6]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD6:%.*]] = load <4 x i32>, ptr [[TMP8]], align 1
 ; CHECK-NEXT:    [[WIDE_LOAD7:%.*]] = load <4 x i32>, ptr [[TMP9]], align 1
@@ -257,9 +257,9 @@ define void @add_i32(ptr noalias nocapture noundef writeonly %A, ptr nocapture n
 ; CHECK-NEXT:    [[TMP13:%.*]] = add <4 x i32> [[WIDE_LOAD7]], [[WIDE_LOAD3]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = add <4 x i32> [[WIDE_LOAD8]], [[WIDE_LOAD4]]
 ; CHECK-NEXT:    [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], i64 4
-; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], i64 8
-; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i32, ptr [[TMP15]], i64 12
+; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 16
+; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 32
+; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds i8, ptr [[TMP15]], i64 48
 ; CHECK-NEXT:    store <4 x i32> [[TMP11]], ptr [[TMP15]], align 1
 ; CHECK-NEXT:    store <4 x i32> [[TMP12]], ptr [[TMP17]], align 1
 ; CHECK-NEXT:    store <4 x i32> [[TMP13]], ptr [[TMP18]], align 1
@@ -447,9 +447,9 @@ define void @trip_count_based_on_ptrtoint(i64 %x) "target-cpu"="apple-m1" {
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP7:%.*]] = mul i64 [[INDEX]], 4
 ; CHECK-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[TMP7]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr i32, ptr [[NEXT_GEP]], i64 4
-; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr i32, ptr [[NEXT_GEP]], i64 8
-; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr i32, ptr [[NEXT_GEP]], i64 12
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i64 16
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i64 32
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i64 48
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[NEXT_GEP]], align 4
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[TMP8]], align 4
 ; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[TMP9]], align 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
index 85726c161cc54..d05a6c35f39c7 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll
@@ -117,7 +117,7 @@ define void @test_widen_induction(ptr %A, i64 %N) {
 ; CHECK-NEXT:    [[VEC_IND:%.*]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[STEP_ADD:%.*]] = add <2 x i64> [[VEC_IND]], splat (i64 2)
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 16
 ; CHECK-NEXT:    store <2 x i64> [[VEC_IND]], ptr [[TMP1]], align 4
 ; CHECK-NEXT:    store <2 x i64> [[STEP_ADD]], ptr [[TMP3]], align 4
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
@@ -201,7 +201,7 @@ define void @test_widen_induction_variable_start(ptr %A, i64 %N, i64 %start) {
 ; CHECK-NEXT:    [[STEP_ADD:%.*]] = add <2 x i64> [[VEC_IND]], splat (i64 2)
 ; CHECK-NEXT:    [[OFFSET_IDX:%.*]] = add i64 [[START]], [[INDEX]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[OFFSET_IDX]]
-; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i64 2
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 16
 ; CHECK-NEXT:    store <2 x i64> [[VEC_IND]], ptr [[TMP2]], align 4
 ; CHECK-NEXT:    store <2 x i64> [[STEP_ADD]], ptr [[TMP4]], align 4
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
@@ -285,7 +285,7 @@ define void @test_widen_induction_step_2(ptr %A, i64 %N, i32 %step) {
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = add <2 x i64> [[VEC_IND]], splat (i64 10)
 ; CHECK-NEXT:    [[TMP3:%.*]] = add <2 x i64> [[STEP_ADD]], splat (i64 10)
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 16
 ; CHECK-NEXT:    store <2 x i64> [[TMP2]], ptr [[TMP1]], align 4
 ; CHECK-NEXT:    store <2 x i64> [[TMP3]], ptr [[TMP5]], align 4
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll b/llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll
index feb0175e75542..9045bb7b070d6 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll
@@ -21,16 +21,16 @@ define double @fp128_fmuladd_reduction(ptr %start0, ptr %start1, ptr %end0, ptr
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr i8, ptr [[START0]], i64 [[TMP0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = mul i64 [[INDEX]], 8
 ; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr i8, ptr [[START1]], i64 [[TMP2]]
-; CHECK-N...
[truncated]

@Mel-Chen
Copy link
Contributor Author

Mel-Chen commented Jan 8, 2026

Another solution to resolve stride operand issue would be to completely change the unrolling scheme of VPVectorPointerRecipe from:

Part 0: %part0_gep = %ptr
Part 1: %part1_gep = gep <SrcEltType>, %ptr, %VF * 1
Part 2: %part2_gep = gep <SrcEltType>, %ptr, %VF * 2
...

to:

Part 0: %part0_gep = %ptr
Part 1: %part1_gep = gep <SrcEltType>, %part0_gep, %VF
Part 2: %part2_gep = gep <SrcEltType>, %part1_gep, %VF
...

However, this would introduce dependencies between the GEPs

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One approach to solve this is to convert VPVectorPointerRecipe into an abstract recipe with the form vector-pointer %ptr, %stride_in_bytes, and revert to handle the unroll part by VPUnrollPartAccessor apporach

My understanding was that we wanted to move away from VPUnrollPartAccessor and unroll things directly in VPlanUnroll. Is there a reason why we can't use stride_in_bytes to unroll in VPlanUnroll? I think that would also allow us to unify it with VPVectorEndPointerRecipe, see #172372

Value *Ptr = State.get(getOperand(0), VPLane(0));
Value *Offset = State.get(getOffset(), true);
Value *ResultPtr = Builder.CreateGEP(getSourceElementType(), Ptr, Offset, "",
Value *ResultPtr = Builder.CreateGEP(Builder.getInt8Ty(), Ptr, Offset, "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use Builder.CreatePtrAdd?

Value *Ptr = State.get(getOperand(0), VPLane(0));
Value *Offset = State.get(getOffset(), true);
Value *ResultPtr = Builder.CreateGEP(getSourceElementType(), Ptr, Offset, "",
Value *ResultPtr = Builder.CreateGEP(Builder.getInt8Ty(), Ptr, Offset, "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builder.createPtrAdd?

Comment on lines +315 to +316
Plan.getConstantInt(
IndexTy, DL.getTypeAllocSize(VPR->getSourceElementType()))},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably hoist this for better wrapping?

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

🐧 Linux x64 Test Results

  • 167786 tests passed
  • 2978 tests skipped
  • 1 test failed

Failed Tests

(click on a test name to see its output)

LLVM

LLVM.CodeGen/AArch64/aggressive-interleaving.ll
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt < /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AArch64/aggressive-interleaving.ll -passes=loop-vectorize -S -mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a320 2>&1 | /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AArch64/aggressive-interleaving.ll --check-prefix=A320
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/opt -passes=loop-vectorize -S -mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a320
# note: command had no output on stdout or stderr
# executed command: /home/gha/actions-runner/_work/llvm-project/llvm-project/build/bin/FileCheck /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AArch64/aggressive-interleaving.ll --check-prefix=A320
# .---command stderr------------
# | /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AArch64/aggressive-interleaving.ll:165:14: error: A320-NEXT: expected string not found in input
# | ; A320-NEXT: [[TMP6:%.*]] = getelementptr inbounds double, ptr [[TMP3]], i64 2
# |              ^
# | <stdin>:121:56: note: scanning from here
# |  %0 = getelementptr inbounds double, ptr %a, i64 %index
# |                                                        ^
# | <stdin>:121:56: note: with "TMP3" equal to "%0"
# |  %0 = getelementptr inbounds double, ptr %a, i64 %index
# |                                                        ^
# | <stdin>:145:2: note: possible intended match here
# |  %gep = getelementptr inbounds double, ptr %a, i64 %iv
# |  ^
# | /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AArch64/aggressive-interleaving.ll:247:14: error: A320-NEXT: expected string not found in input
# | ; A320-NEXT: [[TMP2:%.*]] = getelementptr inbounds double, ptr [[TMP3]], i64 2
# |              ^
# | <stdin>:180:56: note: scanning from here
# |  %1 = getelementptr inbounds double, ptr %b, i64 %index
# |                                                        ^
# | <stdin>:180:56: note: with "TMP3" equal to "%0"
# |  %1 = getelementptr inbounds double, ptr %b, i64 %index
# |                                                        ^
# | <stdin>:209:4: note: possible intended match here
# |  %gep.a = getelementptr inbounds double, ptr %a, i64 %iv
# |    ^
# | 
# | Input file: <stdin>
# | Check file: /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/test/CodeGen/AArch64/aggressive-interleaving.ll
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |           116:  
# |           117: vector.body: ; preds = %vector.body, %vector.ph 
# |           118:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
# |           119:  %vec.phi = phi <2 x double> [ zeroinitializer, %vector.ph ], [ %2, %vector.body ] 
# |           120:  %vec.phi1 = phi <2 x double> [ zeroinitializer, %vector.ph ], [ %3, %vector.body ] 
# |           121:  %0 = getelementptr inbounds double, ptr %a, i64 %index 
# | next:165'0                                                            X error: no match found
# | next:165'1                                                              with "TMP3" equal to "%0"
# |           122:  %1 = getelementptr inbounds i8, ptr %0, i64 16 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           123:  %wide.load = load <2 x double>, ptr %0, align 8 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           124:  %wide.load2 = load <2 x double>, ptr %1, align 8 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           125:  %2 = fadd fast <2 x double> %vec.phi, %wide.load 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           126:  %3 = fadd fast <2 x double> %vec.phi1, %wide.load2 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# |           140:  br label %loop 
# | next:165'0     ~~~~~~~~~~~~~~~~
# |           141:  
# | next:165'0     ~
# |           142: loop: ; preds = %scalar.ph, %loop 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           143:  %iv = phi i64 [ %bc.resume.val, %scalar.ph ], [ %iv.next, %loop ] 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           144:  %sum = phi double [ %bc.merge.rdx, %scalar.ph ], [ %sum.next, %loop ] 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           145:  %gep = getelementptr inbounds double, ptr %a, i64 %iv 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | next:165'2      ?                                                      possible intended match
# |           146:  %val = load double, ptr %gep, align 8 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           147:  %sum.next = fadd fast double %sum, %val 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           148:  %iv.next = add nuw nsw i64 %iv, 1 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           149:  %cond = icmp ult i64 %iv.next, %n 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           150:  br i1 %cond, label %loop, label %exit.loopexit, !llvm.loop !5 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# |           175: vector.body: ; preds = %vector.body, %vector.ph 
# |           176:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
# |           177:  %vec.phi = phi <2 x double> [ zeroinitializer, %vector.ph ], [ %6, %vector.body ] 
# |           178:  %vec.phi1 = phi <2 x double> [ zeroinitializer, %vector.ph ], [ %7, %vector.body ] 
# |           179:  %0 = getelementptr inbounds double, ptr %a, i64 %index 
# |           180:  %1 = getelementptr inbounds double, ptr %b, i64 %index 
# | next:247'0                                                            X error: no match found
# | next:247'1                                                              with "TMP3" equal to "%0"
# |           181:  %2 = getelementptr inbounds i8, ptr %0, i64 16 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           182:  %wide.load = load <2 x double>, ptr %0, align 8 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           183:  %wide.load2 = load <2 x double>, ptr %2, align 8 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           184:  %3 = getelementptr inbounds i8, ptr %1, i64 16 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           185:  %wide.load3 = load <2 x double>, ptr %1, align 8 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# |           204:  br label %loop 
# | next:247'0     ~~~~~~~~~~~~~~~~
# |           205:  
# | next:247'0     ~
# |           206: loop: ; preds = %scalar.ph, %loop 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           207:  %iv = phi i64 [ %bc.resume.val, %scalar.ph ], [ %iv.next, %loop ] 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           208:  %acc = phi double [ %bc.merge.rdx, %scalar.ph ], [ %acc.next, %loop ] 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           209:  %gep.a = getelementptr inbounds double, ptr %a, i64 %iv 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | next:247'2        ?                                                      possible intended match
# |           210:  %gep.b = getelementptr inbounds double, ptr %b, i64 %iv 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           211:  %va = load double, ptr %gep.a, align 8 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           212:  %vb = load double, ptr %gep.b, align 8 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           213:  %prod = fmul fast double %va, %vb 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           214:  %acc.next = fadd fast double %acc, %prod 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

@github-actions
Copy link

github-actions bot commented Jan 8, 2026

🪟 Windows x64 Test Results

  • 129227 tests passed
  • 2852 tests skipped
  • 1 test failed

Failed Tests

(click on a test name to see its output)

LLVM

LLVM.CodeGen/AArch64/aggressive-interleaving.ll
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
c:\_work\llvm-project\llvm-project\build\bin\opt.exe < C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AArch64\aggressive-interleaving.ll -passes=loop-vectorize -S -mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a320 2>&1 | c:\_work\llvm-project\llvm-project\build\bin\filecheck.exe C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AArch64\aggressive-interleaving.ll --check-prefix=A320
# executed command: 'c:\_work\llvm-project\llvm-project\build\bin\opt.exe' -passes=loop-vectorize -S -mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a320
# note: command had no output on stdout or stderr
# executed command: 'c:\_work\llvm-project\llvm-project\build\bin\filecheck.exe' 'C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AArch64\aggressive-interleaving.ll' --check-prefix=A320
# .---command stderr------------
# | C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AArch64\aggressive-interleaving.ll:165:14: error: A320-NEXT: expected string not found in input
# | ; A320-NEXT: [[TMP6:%.*]] = getelementptr inbounds double, ptr [[TMP3]], i64 2
# |              ^
# | <stdin>:121:56: note: scanning from here
# |  %0 = getelementptr inbounds double, ptr %a, i64 %index
# |                                                        ^
# | <stdin>:121:56: note: with "TMP3" equal to "%0"
# |  %0 = getelementptr inbounds double, ptr %a, i64 %index
# |                                                        ^
# | <stdin>:145:2: note: possible intended match here
# |  %gep = getelementptr inbounds double, ptr %a, i64 %iv
# |  ^
# | C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AArch64\aggressive-interleaving.ll:247:14: error: A320-NEXT: expected string not found in input
# | ; A320-NEXT: [[TMP2:%.*]] = getelementptr inbounds double, ptr [[TMP3]], i64 2
# |              ^
# | <stdin>:180:56: note: scanning from here
# |  %1 = getelementptr inbounds double, ptr %b, i64 %index
# |                                                        ^
# | <stdin>:180:56: note: with "TMP3" equal to "%0"
# |  %1 = getelementptr inbounds double, ptr %b, i64 %index
# |                                                        ^
# | <stdin>:209:4: note: possible intended match here
# |  %gep.a = getelementptr inbounds double, ptr %a, i64 %iv
# |    ^
# | 
# | Input file: <stdin>
# | Check file: C:\_work\llvm-project\llvm-project\llvm\test\CodeGen\AArch64\aggressive-interleaving.ll
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |           116:  
# |           117: vector.body: ; preds = %vector.body, %vector.ph 
# |           118:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
# |           119:  %vec.phi = phi <2 x double> [ zeroinitializer, %vector.ph ], [ %2, %vector.body ] 
# |           120:  %vec.phi1 = phi <2 x double> [ zeroinitializer, %vector.ph ], [ %3, %vector.body ] 
# |           121:  %0 = getelementptr inbounds double, ptr %a, i64 %index 
# | next:165'0                                                            X error: no match found
# | next:165'1                                                              with "TMP3" equal to "%0"
# |           122:  %1 = getelementptr inbounds i8, ptr %0, i64 16 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           123:  %wide.load = load <2 x double>, ptr %0, align 8 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           124:  %wide.load2 = load <2 x double>, ptr %1, align 8 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           125:  %2 = fadd fast <2 x double> %vec.phi, %wide.load 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           126:  %3 = fadd fast <2 x double> %vec.phi1, %wide.load2 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# |           140:  br label %loop 
# | next:165'0     ~~~~~~~~~~~~~~~~
# |           141:  
# | next:165'0     ~
# |           142: loop: ; preds = %scalar.ph, %loop 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           143:  %iv = phi i64 [ %bc.resume.val, %scalar.ph ], [ %iv.next, %loop ] 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           144:  %sum = phi double [ %bc.merge.rdx, %scalar.ph ], [ %sum.next, %loop ] 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           145:  %gep = getelementptr inbounds double, ptr %a, i64 %iv 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | next:165'2      ?                                                      possible intended match
# |           146:  %val = load double, ptr %gep, align 8 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           147:  %sum.next = fadd fast double %sum, %val 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           148:  %iv.next = add nuw nsw i64 %iv, 1 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           149:  %cond = icmp ult i64 %iv.next, %n 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           150:  br i1 %cond, label %loop, label %exit.loopexit, !llvm.loop !5 
# | next:165'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# |           175: vector.body: ; preds = %vector.body, %vector.ph 
# |           176:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
# |           177:  %vec.phi = phi <2 x double> [ zeroinitializer, %vector.ph ], [ %6, %vector.body ] 
# |           178:  %vec.phi1 = phi <2 x double> [ zeroinitializer, %vector.ph ], [ %7, %vector.body ] 
# |           179:  %0 = getelementptr inbounds double, ptr %a, i64 %index 
# |           180:  %1 = getelementptr inbounds double, ptr %b, i64 %index 
# | next:247'0                                                            X error: no match found
# | next:247'1                                                              with "TMP3" equal to "%0"
# |           181:  %2 = getelementptr inbounds i8, ptr %0, i64 16 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           182:  %wide.load = load <2 x double>, ptr %0, align 8 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           183:  %wide.load2 = load <2 x double>, ptr %2, align 8 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           184:  %3 = getelementptr inbounds i8, ptr %1, i64 16 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           185:  %wide.load3 = load <2 x double>, ptr %1, align 8 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# |           204:  br label %loop 
# | next:247'0     ~~~~~~~~~~~~~~~~
# |           205:  
# | next:247'0     ~
# |           206: loop: ; preds = %scalar.ph, %loop 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           207:  %iv = phi i64 [ %bc.resume.val, %scalar.ph ], [ %iv.next, %loop ] 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           208:  %acc = phi double [ %bc.merge.rdx, %scalar.ph ], [ %acc.next, %loop ] 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           209:  %gep.a = getelementptr inbounds double, ptr %a, i64 %iv 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | next:247'2        ?                                                      possible intended match
# |           210:  %gep.b = getelementptr inbounds double, ptr %b, i64 %iv 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           211:  %va = load double, ptr %gep.a, align 8 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           212:  %vb = load double, ptr %gep.b, align 8 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           213:  %prod = fmul fast double %va, %vb 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |           214:  %acc.next = fadd fast double %acc, %prod 
# | next:247'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# |             .
# |             .
# |             .
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, VPVectorPointerRecipe has the form: vector-pointer %ptr, (%offset).
This makes it difficult to add a stride operand to VPVectorPointerRecipe for #147297.

Could you add more detail on what is currently making this difficult?

@Mel-Chen
Copy link
Contributor Author

Mel-Chen commented Jan 9, 2026

Currently, VPVectorPointerRecipe has the form: vector-pointer %ptr, (%offset).
This makes it difficult to add a stride operand to VPVectorPointerRecipe for #147297.

Could you add more detail on what is currently making this difficult?

Sure.

Strided access requires vector-pointer %base_ptr, %stride. Assuming we are not considering byte GEPs for now and still use typed GEPs, this recipe will eventually produce

gep <EltType> %base_ptr, %VF * part * %stride.

After rebasing, #147297 provides a quick and simple approach: changing VPVectorPointerRecipe from

vector-pointer %base_ptr, (%offset)

to

vector-pointer %base_ptr, %stride, (%offset).

As I understand it, %offset = %VF * part * %stride. Once %offset is computed after unrolling, there is no need to keep the now-unused %stride. If we want to drop %stride after %offset is available, operand 1 of VPVectorPointerRecipe could end up being either %stride or %offset, i.e.
vector-pointer %base_ptr, %stride | %offset.
This would likely require adding a bool isUnrolled to distinguish which field it represents, but I don’t think that’s a good approach.

Another direction would be to rename offset (I don’t have a good name yet, so let’s call it VFxPart for now). Initially, we would only create

vector-pointer %base_ptr, %stride

and during unrolling change it to

vector-pointer %base_ptr, %stride, %VFxPart

Then, in VPVectorPointerRecipe::execute, we would compute
gep <EltType> %base_ptr, %VFxPart * %stride.

Which approach to take depends on whether we want all Mul operations of VPVectorPointerRecipe to be explicitly represented as recipes in VPlan.

@Mel-Chen
Copy link
Contributor Author

Mel-Chen commented Jan 9, 2026

One approach to solve this is to convert VPVectorPointerRecipe into an abstract recipe with the form vector-pointer %ptr, %stride_in_bytes, and revert to handle the unroll part by VPUnrollPartAccessor apporach

My understanding was that we wanted to move away from VPUnrollPartAccessor and unroll things directly in VPlanUnroll. Is there a reason why we can't use stride_in_bytes to unroll in VPlanUnroll? I think that would also allow us to unify it with VPVectorEndPointerRecipe, see #172372

I’m wondering whether we could explicitly add another operand to represent part instead:

vector-pointer %ptr, 0

--> Unrolling -->
vector-pointer %ptr, 0
vector-pointer %ptr, 1

--> convertToConcreteRecipes -->
%ptr
%offset = %VF * 1 * EltSize
ptradd %ptr, %offset

@Mel-Chen
Copy link
Contributor Author

Another direction would be to rename offset (I don’t have a good name yet, so let’s call it VFxPart for now). Initially, we would only create

vector-pointer %base_ptr, %stride

and during unrolling change it to

vector-pointer %base_ptr, %stride, %VFxPart

Then, in VPVectorPointerRecipe::execute, we would compute gep <EltType> %base_ptr, %VFxPart * %stride.

I’ve temporarily gone with this approach in #147297. In 61f0b2a, you can see that whether the stride multiplication is explicitly pulled out during unrolling does have some impact, but not much.

Which approach to take depends on whether we want all Mul operations of VPVectorPointerRecipe to be explicitly represented as recipes in VPlan.

So I think this topic can be deferred for now.

@Mel-Chen Mel-Chen marked this pull request as draft January 12, 2026 09:26
@lukel97
Copy link
Contributor

lukel97 commented Jan 12, 2026

I’ve temporarily gone with this approach in #147297. In 61f0b2a, you can see that whether the stride multiplication is explicitly pulled out during unrolling does have some impact, but not much.

I'm confused by this direction, it seems to be making VPVectorPointerRecipe more complicated, and I'm not sure why that's a better approach over this PR.

To make sure I understand the approach this PR would have taken:

As a bonus to this approach, VPUnroll could just directly unroll VPVectorPointerRecipes to VPInstruction::PtrAdds. We can remove VPVectorPointerRecipe::execute. It also addresses your comment about having a leftover unused stride operand.

I'd much prefer we continue to simplify VPVectorPointerRecipe as much as possible with this PR, unless there's something I'm maybe overlooking.

@Mel-Chen
Copy link
Contributor Author

I'm confused by this direction, it seems to be making VPVectorPointerRecipe more complicated, and I'm not sure why that's a better approach over this PR.

To make sure I understand the approach this PR would have taken:

As a bonus to this approach, VPUnroll could just directly unroll VPVectorPointerRecipes to VPInstruction::PtrAdds. We can remove VPVectorPointerRecipe::execute. It also addresses your comment about having a leftover unused stride operand.

I'd much prefer we continue to simplify VPVectorPointerRecipe as much as possible with this PR, unless there's something I'm maybe overlooking.

Yes, your understanding is correct.
However, from what I can see, #147297 doesn't have a strong dependency on this approach. We could land #147297 first and then come back to refactor VPVectorPointerRecipe.

Or, if there are other PRs that depend on this patch, I'd be happy to reopen it and continue pushing it forward. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants