Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 16 additions & 5 deletions src/coreclr/jit/emitarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16248,6 +16248,12 @@ bool emitter::ReplaceLdrStrWithPairInstr(
// ldr w1, [x20, #0x14]
// ldr w2, [x20, #0x10] => ldp w2, w1, [x20, #0x10]
//
// ldr w1, [x20]
// ldr w2, [x20, #0x04] => ldp w1, w2, [x20]
//
// ldr q1, [x0, #0x20]
// ldr q2, [x0, #0x30] => ldp q1, q2, [x0, #0x20]
//
// Arguments:
// ins - The instruction code
// reg1 - Register 1 number
Expand Down Expand Up @@ -16291,16 +16297,21 @@ emitter::RegisterOrder emitter::IsOptimizableLdrStrWithPair(
return eRO_none;
}

if ((!isGeneralRegisterOrZR(reg1)) || (!isGeneralRegisterOrZR(prevReg1)))
if (reg1 == REG_SP || prevReg1 == REG_SP || (isGeneralRegisterOrZR(reg1) && isVectorRegister(prevReg1)) ||
(isVectorRegister(reg1) && isGeneralRegisterOrZR(prevReg1)))
{
// Either register 1 is not a general register or previous register 1 is not a general register
// or the zero register, so we cannot optimise.
// We cannot optimise when one of the following conditions are met
// 1. reg1 or prevReg1 is SP
// 2. both reg1 and prevReg1 are not of the same type (SIMD or non-SIMD)
return eRO_none;
}

if (lastInsFmt != fmt)
if (lastInsFmt != fmt && !(lastInsFmt == IF_LS_2B && fmt == IF_LS_2A) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why we have to add additional checks for IF_LS_2B and IF_LS_2A? Are they specifically because we are adding vector register support? Why were they not needed previously?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is here that we use IF_LS_2A for base (no offset) and IF_LS_2B for base + offset? Presumably imm/prevImm are correctly zero for 2A?

Would it be easier to read inverted? i.e.,

const bool compatibleFmt = (lastInsFmt == fmt) || (lastInsFmt == IF_LS_2B && fmt == IF_LS_2A) || (lastInsFmt == IF_LS_2A && fmt == IF_LS_2B);
if (!compatibleFmt) {... return eRO_none; }

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any general register (non-Vector) diffs from just this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is here that we use IF_LS_2A for base (no offset) and IF_LS_2B for base + offset?

Sure, but we don't do it for GPR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that the title of the PR doesn't specifies the full functionality. Explicit format check is allowing us to catch the consecutive ldr/str where one instruction uses the offset and one without. This is applicable for both general purpose and SIMD/Vector registers.

Are there any general register (non-Vector) diffs from just this change?

Yup, there are multiple such changes. e.g.,

-4 (-16.67%) : 2709.dasm - System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this
@@ -20,15 +20,14 @@ G_M30325_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M30325_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0001 {x0}, byref
             ; byrRegs +[x0]
-            str     x1, [x0]
-            str     x2, [x0, #0x08]
-						;; size=8 bbWeight=1 PerfScore 2.00
+            stp     x1, x2, [x0]
+						;; size=4 bbWeight=1 PerfScore 1.00
 G_M30325_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.90, instruction count 6, allocated bytes for code 24 (MethodHash=b3c1898a) for method System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this
+; Total bytes of code 20, prolog size 8, PerfScore 6.50, instruction count 5, allocated bytes for code 20 (MethodHash=b3c1898a) for method System.ValueTuple`2[long,System.DateTime]:.ctor(long,System.DateTime):this
 ; ============================================================
 

Sure, but we don't do it for GPR?

Yup, I think it was missed previously.

Matching the consecutive ldr/str with mixed formatting is letting us further optimise what the previous optimisation would have allowed us. e.g.,

Previously, the following sequence

str     s0, [x0]
str     s1, [x0, #0x04]
str     s2, [x0, #0x08]
str     s3, [x0, #0x0C]

may have been optimised to

str     s0, [x0]
stp     s1, s2, [x0, #0x04]
str     s3, [x0, #0x0C]

but now would be optimised to

stp     s0, s1, [x0]
stp     s2, s3, [x0, #0x08]

Copy link
Contributor Author

@SwapnilGaikwad SwapnilGaikwad Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easier to read inverted? i.e.,

Sure, this is more readable. Done 👍

!(lastInsFmt == IF_LS_2A && fmt == IF_LS_2B))
{
// The formats of the two instructions differ.
// We cannot optimise when all of the following conditions are met
// 1. instruction formats differ
// 2. instructions are not using "base" or "base plus immediate offset" addressing modes
return eRO_none;
}

Expand Down