Handle some additional floating-point and SIMD morph optimizations #122368

tannergooding · 2025-12-09T23:46:34Z

This resolves #122272

…nstant folding

dotnet-policy-service · 2025-12-09T23:47:41Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

BoyBaykiller · 2025-12-10T00:51:35Z

This existing transform to remove double negation doesn't seem to apply here.

runtime/src/coreclr/jit/morph.cpp

Lines 8117 to 8125 in 0fb17f7

    
           // Remove double negation/not. 
        
           if (op1->OperIs(oper)) 
        
           { 
        
               JITDUMP("Remove double negation/not\n") 
        
               GenTree* op1op1 = op1->gtGetOp1(); 
        
               DEBUG_DESTROY_NODE(tree); 
        
               DEBUG_DESTROY_NODE(op1); 
        
               return op1op1; 
        
           }

Probably some ordering issue?

double NegateMulTest(double num)
{
    num *= -1;
    num *= -1;
    num *= -1;

    return num;
}

G_M43396_IG02:  ;; offset=0x0000
       vxorps   xmm0, xmm0, xmmword ptr [reloc @RWD00]
       vxorps   xmm0, xmm0, xmmword ptr [reloc @RWD00]
       vxorps   xmm0, xmm0, xmmword ptr [reloc @RWD00]

Meanwhile:

double NegateMulOld(double num)
{
    num = -num;
    num = -num;
    num = -num;

    return num;
}

G_M22709_IG02:  ;; offset=0x0000
       vxorps   xmm0, xmm0, xmmword ptr [reloc @RWD00]

Copilot

Pull request overview

This PR implements additional floating-point and SIMD morph optimizations by moving and extending transformation logic. The changes include optimizations for division-to-multiplication conversion (when divisor is a power of 2), multiplication by -1 (converting to negation), and multiplication by 2 (converting to addition).

Key changes:

Relocated the scalar floating-point division-to-multiplication optimization from early in GT_DIV case to after GT_NEG transformation, ensuring it applies the reciprocal optimization correctly after sign transformations
Added new scalar floating-point optimizations for multiplication by -1 and 2 in fgOptimizeMultiply
Extended SIMD GT_DIV and GT_MUL cases with similar optimizations including division-to-multiplication for power-of-2 divisors, multiplication by -1 to negation, and multiplication by 2 to addition

Comments suppressed due to low confidence (1)

src/coreclr/jit/morph.cpp:10561

The DEBUG_DESTROY_NODE is missing for the original mul node. This case creates a new add node and returns it but doesn't destroy the original multiplication node, similar to the issue on line 9846. For consistency with line 10535 which properly destroys mul when returning op1, this case should also include DEBUG_DESTROY_NODE(mul) before returning add.

            DEBUG_DESTROY_NODE(op2);
            op2          = fgMakeMultiUse(&op1);
            GenTree* add = gtNewOperNode(GT_ADD, mul->TypeGet(), op1, op2);
            add->SetMorphed(this, /* doChildren */ true);
            return add;

src/coreclr/jit/morph.cpp

tannergooding · 2025-12-11T02:00:12Z

This existing transform to remove double negation doesn't seem to apply here.

Yes, there's a lack of attempting to "re-morph" here once the new operation is recognized. I added in a "hack" specifically to handle this case. Ideally there's a larger refactoring of fgMorphSmpOp to make this easier to reuse, much as exists for fgMorphHWIntrinsic

(re-morph is in quotes because we don't actually try to morph twice and there's various transforms we don't want to repeat, but there is some post-order processing we can defer back to after the oper type changes if we have the right helper methods).

tannergooding · 2025-12-11T04:49:40Z

CC. @dotnet/jit-contrib, @EgorBo

No TP regression and some minor positive diffs: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1227641&view=ms.vss-build-web.run-extensions-tab

Few simplifications for each case added, such as negating instead of multiplying or adding instead of multiplying by two.

There does look to be a missing AVX-512 containment opportunity around vxorps xmm0, xmm1, xmmword ptr [reloc] which could be vxorps xmm0, xmm1, dword ptr [reloc] {1to4} instead, but that's unrelated to this PR

There are also a couple small size regressions on x64 due to different register selection changing codegen output minimally or causing an extra prologue spill and epilogue restore. There's also one case where some code is doing division and multiplication by the same power of 2 value and so converting to multiplication by a reciprocal causes an extra constant to exist and be used. Handling this is probably difficult without some kind of interplay with CSE/VN to see number of uses of a given value (so a decision can be made on whether its more expensive to do a second load from L1 or to spend extra cycles doing division). -- With the cases in the diff all still being profitable ones, so the PR looks good overall

src/coreclr/jit/morph.cpp

EgorBo

a couple of nits, but fine as is.

tannergooding added 5 commits December 9, 2025 15:26

Move the val / cnsPow2 transform to post-order so it can utilize co…

1e1c130

…nstant folding

Add the val / cnsPow2 transform for hwintrinsic nodes

29fc697

Add a transform of val * -1.0 to -val

714370f

Add the 'val * 2.0' transform for hwintrinsic nodes

4176952

Fix compilation

51cb53b

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 9, 2025

dotnet-policy-service bot assigned tannergooding Dec 9, 2025

Apply formatting patch

fcd63d0

This was referenced Dec 10, 2025

iOS device not found in OSX.13.Amd64.Iphone.Open dotnet/dnceng#6440

Open

Sometimes the helix SDK uses GetWorkItemsAsync when workitems aren't done processing. dotnet/dnceng#6011

Open

tannergooding marked this pull request as ready for review December 11, 2025 01:17

Copilot AI review requested due to automatic review settings December 11, 2025 01:17

Copilot started reviewing on behalf of tannergooding December 11, 2025 01:18 View session

Copilot AI reviewed Dec 11, 2025

View reviewed changes

Respond to PR feedback

96453c6

tannergooding force-pushed the fix-122272 branch from 31a1bed to 96453c6 Compare December 11, 2025 02:07

EgorBo reviewed Jan 7, 2026

View reviewed changes

src/coreclr/jit/morph.cpp Show resolved Hide resolved

EgorBo reviewed Jan 7, 2026

View reviewed changes

src/coreclr/jit/morph.cpp Show resolved Hide resolved

EgorBo reviewed Jan 7, 2026

View reviewed changes

src/coreclr/jit/morph.cpp Show resolved Hide resolved

EgorBo approved these changes Jan 7, 2026

View reviewed changes

This was referenced Jan 7, 2026

Extend morph optimization for DIV(NEG(x), C) to also apply to MUL(NEG(x), C) #122974

Open

Consider removing some of the if (fgGlobalMorph) checks as it no longer interferes with CSE candidates #122975

Open

tannergooding merged commit 4d88303 into dotnet:main Jan 7, 2026
123 checks passed

tannergooding deleted the fix-122272 branch January 7, 2026 16:03

Handle some additional floating-point and SIMD morph optimizations #122368

Handle some additional floating-point and SIMD morph optimizations #122368

Uh oh!

Conversation

tannergooding commented Dec 9, 2025

Uh oh!

dotnet-policy-service bot commented Dec 9, 2025

Uh oh!

BoyBaykiller commented Dec 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EgorBo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tannergooding commented Dec 11, 2025 •

edited

Loading

tannergooding commented Dec 11, 2025 •

edited

Loading