-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Closed
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsoptimization
Milestone
Description
When passing a non-constant value into the Span<T> and ReadOnlySpan<T> indexer, the JIT will emit an unnecessary movsxd instruction on x64. The repro is fairly simple:
for (int i = 0; i < ints.Length; i++)
{
retVal += ints[i];
}Current codegen:
00007ffd`2a2e7291 85c9 test ecx,ecx
00007ffd`2a2e7293 7e0f jle <AFTER_LOOP>
00007ffd`2a2e7295 4d63d1 movsxd r10,r9d
00007ffd`2a2e7298 42030492 add eax,dword ptr [rdx+r10*4]
00007ffd`2a2e729c 41ffc1 inc r9d
00007ffd`2a2e729f 443bc9 cmp r9d,ecx
00007ffd`2a2e72a2 7cf1 jl 00007ffd`2a2e7295I prototyped the below change in my local branch by modifying the logic in importer.cpp to use zero-extension instead of signed-extension for the span indexer and ran a benchmark. The modified code took approximately one-third less time to run. This optimization may be worth investigating if we believe that developers are iterating over spans in hot loops. (Admittedly, any more complex logic within the loop would almost certainly overwhelm these benchmark results.)
// Element access
GenTree* indexIntPtr = gtNewCastNode(TYP_U_IMPL, indexClone, true /* fromUnsigned */, TYP_U_IMPL); // <-- modified line
GenTree* sizeofNode = gtNewIconNode(elemSize);
GenTree* mulNode = gtNewOperNode(GT_MUL, TYP_U_IMPL, indexIntPtr, sizeofNode); // <-- modified line| Method | Toolchain | SpanLength | Mean | Error | StdDev | Ratio | RatioSD |
|---|---|---|---|---|---|---|---|
| SumInts | baseline | 48 | 2,921.32 us | 53.786 us | 44.914 us | 1.00 | 0.00 |
| SumInts | modified | 48 | 1,964.96 us | 38.825 us | 43.154 us | 0.67 | 0.02 |
| SumInts | baseline | 512 | 35,429.46 us | 574.800 us | 537.669 us | 1.00 | 0.00 |
| SumInts | modified | 512 | 23,219.06 us | 457.335 us | 698.398 us | 0.67 | 0.02 |
| SumInts | baseline | 2048 | 139,664.62 us | 1,799.241 us | 1,683.011 us | 1.00 | 0.00 |
| SumInts | modified | 2048 | 93,175.18 us | 1,838.916 us | 3,586.665 us | 0.66 | 0.04 |
/cc @dotnet/jit-contrib
category:cq
theme:basic-cq
skill-level:expert
cost:medium
impact:medium
Jorenkv and omariom
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsoptimization