Skip to content

Commit 887a2e8

Browse files
pratikasharigcbot
authored andcommitted
Skip RMW for loop split temporaries
RMW optimization in spill insertion should conservatively assume that loop split temporaries require read-modify-write sequence. Following explains why this is required: // ******************** // Before: // Loop_Header: // V10:uq = ... // ... // jmpi Loop_Header // ... // = V10 // ******************** // After split around loop: // // (W) LOOP_TMP = V10 // Loop_Header: // LOOP_TMP:uq = ... // ... // jmpi Loop_Header // (W) V10 = LOOP_TMP // ... // = V10 // ******************** // Since V10 is spilled already, spill/fill code is inserted as: // (Note that program no longer has direct references to V10) // // (W) FILL_TMP = Fill from V10 offset // (W) LOOP_TMP = FILL_TMP // Loop_Header: // LOOP_TMP:uq = ... // ... // jmpi Loop_Header // (W) SPILL_TMP = LOOP_TMP // (W) Spill SPILL_TMP to V10 offset // ... // (W) FILL_TMP1 = Fill from V10 offset // = FILL_TMP1 // ******************** // // If LOOP_TMP is spilled in later iteration, we need to check whether // RMW is needed for its def in the loop body. But by this iteration // all original references to V10 have already been transformed to // temporary ranges, so we cannot easily determine dominance relation // between LOOP_TMP and other V10 references. If LOOP_TMP doesn't // dominate all defs and uses then it would be illegal to skip RMW. Hence, // we conservatively assume RMW is required for LOOP_TMP.
1 parent f3a5f48 commit 887a2e8

File tree

1 file changed

+53
-5
lines changed

1 file changed

+53
-5
lines changed

visa/SpillManagerGMRF.cpp

Lines changed: 53 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2800,13 +2800,61 @@ void SpillManagerGRF::updateRMWNeeded() {
28002800
// Check0 : Def is NoMask, -- checked in isPartialWriteForSpill()
28012801
// Check1 : Def is unique def,
28022802
// Check2 : Def is in loop L and all use(s) of dcl are in loop L or it's
2803-
// inner loop nest, Check3 : Flowgraph is reducible RMW_Not_Needed = Check0
2804-
// || (Check1 && Check2 && Check3)
2803+
// inner loop nest,
2804+
// Check3 : Flowgraph is reducible
2805+
// Check4 : Dcl is not a split around loop temp
2806+
// RMW_Not_Needed = (Check0 || (Check1 && Check2 && Check3)) && Check4
28052807
bool RMW_Needed = true;
28062808

2807-
if (isUniqueDef && builder_->kernel.fg.isReducible() &&
2808-
checkDefUseDomRel(spilledRegion, bb)) {
2809-
RMW_Needed = false;
2809+
// Reason for Check4:
2810+
// ********************
2811+
// Before:
2812+
// Loop_Header:
2813+
// V10:uq = ...
2814+
// ...
2815+
// jmpi Loop_Header
2816+
// ...
2817+
// = V10
2818+
// ********************
2819+
// After split around loop:
2820+
//
2821+
// (W) LOOP_TMP = V10
2822+
// Loop_Header:
2823+
// LOOP_TMP:uq = ...
2824+
// ...
2825+
// jmpi Loop_Header
2826+
// (W) V10 = LOOP_TMP
2827+
// ...
2828+
// = V10
2829+
// ********************
2830+
// Since V10 is spilled already, spill/fill code is inserted as:
2831+
// (Note that program no longer has direct references to V10)
2832+
//
2833+
// (W) FILL_TMP = Fill from V10 offset
2834+
// (W) LOOP_TMP = FILL_TMP
2835+
// Loop_Header:
2836+
// LOOP_TMP:uq = ...
2837+
// ...
2838+
// jmpi Loop_Header
2839+
// (W) SPILL_TMP = LOOP_TMP
2840+
// (W) Spill SPILL_TMP to V10 offset
2841+
// ...
2842+
// (W) FILL_TMP1 = Fill from V10 offset
2843+
// = FILL_TMP1
2844+
// ********************
2845+
//
2846+
// If LOOP_TMP is spilled in later iteration, we need to check whether
2847+
// RMW is needed for its def in the loop body. But by this iteration
2848+
// all original references to V10 have already been transformed to
2849+
// temporary ranges, so we cannot easily determine dominance relation
2850+
// between LOOP_TMP and other V10 references. If LOOP_TMP doesn't
2851+
// dominate all defs and uses then it would be illegal to skip RMW. Hence,
2852+
// we conservatively assume RMW is required for LOOP_TMP.
2853+
if (gra.splitResults.count(spilledRegion->getTopDcl()) == 0) {
2854+
if (isUniqueDef && builder_->kernel.fg.isReducible() &&
2855+
checkDefUseDomRel(spilledRegion, bb)) {
2856+
RMW_Needed = false;
2857+
}
28102858
}
28112859

28122860
return RMW_Needed;

0 commit comments

Comments
 (0)