-
Notifications
You must be signed in to change notification settings - Fork 15.9k
[CodeGen][MachineLICM] Use RegUnits in HoistRegionPostRA #94608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -223,8 +223,8 @@ namespace { | |||||||||||||||||||||||||||||||||||||||||||
| void HoistPostRA(MachineInstr *MI, unsigned Def, MachineLoop *CurLoop, | ||||||||||||||||||||||||||||||||||||||||||||
| MachineBasicBlock *CurPreheader); | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| void ProcessMI(MachineInstr *MI, BitVector &PhysRegDefs, | ||||||||||||||||||||||||||||||||||||||||||||
| BitVector &PhysRegClobbers, SmallSet<int, 32> &StoredFIs, | ||||||||||||||||||||||||||||||||||||||||||||
| void ProcessMI(MachineInstr *MI, BitVector &RUDefs, BitVector &RUClobbers, | ||||||||||||||||||||||||||||||||||||||||||||
| SmallSet<int, 32> &StoredFIs, | ||||||||||||||||||||||||||||||||||||||||||||
| SmallVectorImpl<CandidateInfo> &Candidates, | ||||||||||||||||||||||||||||||||||||||||||||
| MachineLoop *CurLoop); | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -423,10 +423,25 @@ static bool InstructionStoresToFI(const MachineInstr *MI, int FI) { | |||||||||||||||||||||||||||||||||||||||||||
| return false; | ||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| static void applyBitsNotInRegMaskToRegUnitsMask(const TargetRegisterInfo *TRI, | ||||||||||||||||||||||||||||||||||||||||||||
Pierre-vh marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||||||||||
| BitVector &RUs, | ||||||||||||||||||||||||||||||||||||||||||||
| const uint32_t *Mask) { | ||||||||||||||||||||||||||||||||||||||||||||
| // FIXME: Use RUs better here | ||||||||||||||||||||||||||||||||||||||||||||
| BitVector MaskedRegs(TRI->getNumRegs()); | ||||||||||||||||||||||||||||||||||||||||||||
| MaskedRegs.setBitsNotInMask(Mask); | ||||||||||||||||||||||||||||||||||||||||||||
| for (const auto &Set : MaskedRegs.set_bits()) { | ||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||
| for (const auto &Set : MaskedRegs.set_bits()) { | |
| for (unsigned SetBitIndex : MaskedRegs.set_bits()) { |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change originates from here. The SI_CALL has a regmask with SGPR4:
dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr8_sgpr9, 0, <regmask $sgpr_null $sgpr_null_hi $src_private_base $src_private_base_hi $src_private_base_lo $src_private_limit $src_private_limit_hi $src_private_limit_lo $src_shared_base $src_shared_base_hi $src_shared_base_lo $src_shared_limit $src_shared_limit_hi $src_shared_limit_lo $sgpr4 $sgpr5 $sgpr6 $sgpr7 $sgpr8 $sgpr9 $sgpr10 $sgpr11 $sgpr12 $sgpr13 $sgpr14 $sgpr15 $sgpr16 $sgpr17 $sgpr18 $sgpr19 $sgpr20 $sgpr21 $sgpr22 and 1139 more...>, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4
I think the regression is actually a bugfix. We can't hoist a def of s4 because the indirect call may clobber anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regmask includes sgpr4, which means the call preserves s4, so we can hoist a def of s4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could pre-commit removal of the redundant !Reg check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code calls RUDefs.set even if it was already set. How about:
| if (RUDefs.test(*RUI)) { | |
| RUClobbers.set(*RUI); | |
| RuledOut = true; | |
| } else if (RUClobbers.test(*RUI)) { | |
| // MI defined register is seen defined by another instruction in | |
| // the loop, it cannot be a LICM candidate. | |
| RuledOut = true; | |
| } | |
| RUDefs.set(*RUI); | |
| if (RUDefs.test(*RUI)) { | |
| RUClobbers.set(*RUI); | |
| RuledOut = true; | |
| } else { | |
| RUDefs.set(*RUI); | |
| if (RUClobbers.test(*RUI)) { | |
| // MI defined register is seen defined by another instruction in | |
| // the loop, it cannot be a LICM candidate. | |
| RuledOut = true; | |
| } | |
| } |
Maybe it doesn't actually run any faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't accounting for the lanemasks in liveins, could possibly be the regression reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it didn't account for them before either, no?
I'm trying to keep this as close as possible to a NFC so I'd rather add that as a follow up change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly when lane liveins are introduced, or how they interact with aliases. This is another reason that tracking liveins by regunits would be easier to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This misses regmasks for some reason, but it also missed them before?
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -886,12 +886,12 @@ define void @test_indirect_call_vgpr_ptr_inreg_arg(ptr %fptr) { | |
| ; GCN-NEXT: v_writelane_b32 v40, s62, 30 | ||
| ; GCN-NEXT: v_writelane_b32 v40, s63, 31 | ||
| ; GCN-NEXT: s_mov_b64 s[6:7], exec | ||
| ; GCN-NEXT: s_movk_i32 s4, 0x7b | ||
| ; GCN-NEXT: .LBB6_1: ; =>This Inner Loop Header: Depth=1 | ||
| ; GCN-NEXT: v_readfirstlane_b32 s8, v0 | ||
| ; GCN-NEXT: v_readfirstlane_b32 s9, v1 | ||
| ; GCN-NEXT: v_cmp_eq_u64_e32 vcc, s[8:9], v[0:1] | ||
| ; GCN-NEXT: s_and_saveexec_b64 s[10:11], vcc | ||
| ; GCN-NEXT: s_movk_i32 s4, 0x7b | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the only regression I observed. I didn't investigate it in depth yet but I plan to spend some time on it to figure it out. |
||
| ; GCN-NEXT: s_swappc_b64 s[30:31], s[8:9] | ||
| ; GCN-NEXT: ; implicit-def: $vgpr0_vgpr1 | ||
| ; GCN-NEXT: s_xor_b64 exec, exec, s[10:11] | ||
|
|
@@ -980,12 +980,12 @@ define void @test_indirect_call_vgpr_ptr_inreg_arg(ptr %fptr) { | |
| ; GISEL-NEXT: v_writelane_b32 v40, s62, 30 | ||
| ; GISEL-NEXT: v_writelane_b32 v40, s63, 31 | ||
| ; GISEL-NEXT: s_mov_b64 s[6:7], exec | ||
| ; GISEL-NEXT: s_movk_i32 s4, 0x7b | ||
| ; GISEL-NEXT: .LBB6_1: ; =>This Inner Loop Header: Depth=1 | ||
| ; GISEL-NEXT: v_readfirstlane_b32 s8, v0 | ||
| ; GISEL-NEXT: v_readfirstlane_b32 s9, v1 | ||
| ; GISEL-NEXT: v_cmp_eq_u64_e32 vcc, s[8:9], v[0:1] | ||
| ; GISEL-NEXT: s_and_saveexec_b64 s[10:11], vcc | ||
| ; GISEL-NEXT: s_movk_i32 s4, 0x7b | ||
| ; GISEL-NEXT: s_swappc_b64 s[30:31], s[8:9] | ||
| ; GISEL-NEXT: ; implicit-def: $vgpr0 | ||
| ; GISEL-NEXT: s_xor_b64 exec, exec, s[10:11] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit longer than just creating a BitVector but it avoids a dynamic allocation which I think is worth it.
Otherwise we instantiate a BitVector pretty much on each instruction.