-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Handle some additional floating-point and SIMD morph optimizations #122368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
This existing transform to remove double negation doesn't seem to apply here. runtime/src/coreclr/jit/morph.cpp Lines 8117 to 8125 in 0fb17f7
Probably some ordering issue? double NegateMulTest(double num)
{
num *= -1;
num *= -1;
num *= -1;
return num;
}G_M43396_IG02: ;; offset=0x0000
vxorps xmm0, xmm0, xmmword ptr [reloc @RWD00]
vxorps xmm0, xmm0, xmmword ptr [reloc @RWD00]
vxorps xmm0, xmm0, xmmword ptr [reloc @RWD00]Meanwhile: double NegateMulOld(double num)
{
num = -num;
num = -num;
num = -num;
return num;
}G_M22709_IG02: ;; offset=0x0000
vxorps xmm0, xmm0, xmmword ptr [reloc @RWD00] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements additional floating-point and SIMD morph optimizations by moving and extending transformation logic. The changes include optimizations for division-to-multiplication conversion (when divisor is a power of 2), multiplication by -1 (converting to negation), and multiplication by 2 (converting to addition).
Key changes:
- Relocated the scalar floating-point division-to-multiplication optimization from early in GT_DIV case to after GT_NEG transformation, ensuring it applies the reciprocal optimization correctly after sign transformations
- Added new scalar floating-point optimizations for multiplication by -1 and 2 in fgOptimizeMultiply
- Extended SIMD GT_DIV and GT_MUL cases with similar optimizations including division-to-multiplication for power-of-2 divisors, multiplication by -1 to negation, and multiplication by 2 to addition
Comments suppressed due to low confidence (1)
src/coreclr/jit/morph.cpp:10561
- The DEBUG_DESTROY_NODE is missing for the original mul node. This case creates a new add node and returns it but doesn't destroy the original multiplication node, similar to the issue on line 9846. For consistency with line 10535 which properly destroys mul when returning op1, this case should also include DEBUG_DESTROY_NODE(mul) before returning add.
DEBUG_DESTROY_NODE(op2);
op2 = fgMakeMultiUse(&op1);
GenTree* add = gtNewOperNode(GT_ADD, mul->TypeGet(), op1, op2);
add->SetMorphed(this, /* doChildren */ true);
return add;
Yes, there's a lack of attempting to "re-morph" here once the new operation is recognized. I added in a "hack" specifically to handle this case. Ideally there's a larger refactoring of (re-morph is in quotes because we don't actually try to morph twice and there's various transforms we don't want to repeat, but there is some post-order processing we can defer back to after the oper type changes if we have the right helper methods). |
31a1bed to
96453c6
Compare
|
CC. @dotnet/jit-contrib, @EgorBo No TP regression and some minor positive diffs: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1227641&view=ms.vss-build-web.run-extensions-tab Few simplifications for each case added, such as negating instead of multiplying or adding instead of multiplying by two. There does look to be a missing AVX-512 containment opportunity around There are also a couple small size regressions on x64 due to different register selection changing codegen output minimally or causing an extra prologue spill and epilogue restore. There's also one case where some code is doing division and multiplication by the same power of 2 value and so converting to multiplication by a reciprocal causes an extra constant to exist and be used. Handling this is probably difficult without some kind of interplay with CSE/VN to see number of uses of a given value (so a decision can be made on whether its more expensive to do a second load from L1 or to spend extra cycles doing division). -- With the cases in the diff all still being profitable ones, so the PR looks good overall |
EgorBo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of nits, but fine as is.
This resolves #122272