-
Notifications
You must be signed in to change notification settings - Fork 10
pre-commit: PR165258 #3007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pre-commit: PR165258 #3007
Conversation
Diff moderunner: ariselab-64c-docker 8147 files changed, 1318112 insertions(+), 1323505 deletions(-) +6 llvm/Target.ll |
model: qwen-plus-latest |
| _ZN4llvm12StringSwitchINS_5MachO12PlatformTypeES2_E4CaseENS_13StringLiteralES2_.exit97.thread: ; preds = %_ZN4llvmeqENS_9StringRefES0_.exit.i.i, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i14, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i94, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i86 | ||
| %.sroa.30.11.ph = phi i64 [ 4294967297, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i14 ], [ 4294967296, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i ], [ 0, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i86 ], [ 0, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i94 ] | ||
| br label %_ZN4llvm12StringSwitchINS_5MachO12PlatformTypeES2_E4CaseENS_13StringLiteralES2_.exit105 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the reason: it seems to be the missed optimization of TryToSimplifyUncondBranchFromEmptyBlock, which bails out and cannot fold other phi values in BB into the successor Succ if there are >1 shared predecessors between BB and Succ.
Details
The original JumpThreading generates the following block:
foo.exit97.thread6: ; preds = %i.i, %i.i14, %i.i86
%.sroa.10.ph = phi i64 [ 5, %i.i14 ], [ 7, %i.i ], [ 9, %i.i86 ]
%.sroa.30.10.ph = phi i64 [ 4294967297, %i.i14 ], [ 4294967296, %i.i ], [ 0, %i.i86 ]
br label %foo.exit105
%foo.exit105:
...While there is only ONE common predecessor (i.e., %i.i86) between foo.exit97.thread6 and its successor %foo.exit105, TryToSimplifyUncondBranchFromEmptyBlock can fold other phi node values into %foo.exit105, as follows:
foo.exit97.thread6: ; preds = %i.i86
br label %foo.exit105
%foo.exit105:
_ = phi i64 ..., [ 5, %i.i14 ], [ 7, %i.i ], [9, %foo.exit97.thread6]
_ = phi i64 ..., [ 4294967297, %i.i14 ], [ 4294967296, %i.i ], [0, %foo.exit97.thread6]
...Eventually, with the simplified branch values of {{0}, {9}}, JumpThreading can further fold foo.exit97.thread6 into %foo.exit105's successor %foo.exit105.thread as follows:
BB 'foo.exit105': FOUND condition = i1 true for pred 'foo.exit97.thread6'.
Threading edge from 'foo.exit97.thread6' to 'foo.exit105.thread', across block: foo.exit105
The enhancement of SCCP makes JumpThreading optimize more, generating the following block (ONE more block of %i.i94 merged):
foo.exit97.thread: ; preds = %i.i, %i.i14, %i.i94, %i.i86
%.sroa.0.ph = phi i64 [ 5, %i.i14 ], [ 7, %i.i ], [ 9, %i.i86 ], [ 4, %i.i94 ]
%.sroa.1.ph = phi i64 [ 4294967297, %i.i14 ], [ 4294967296, %i.i ], [ 0, %i.i86 ], [ 0, %i.i94 ]
br label %foo.exit105
%foo.exit105:
...%i.i94 is also a shared predecessor between %foo.exit97.thread and %foo.exit105, leading to there are TWO shared predecessors (%i.i94 and %i.i86).
However, currently, TryToSimplifyUncondBranchFromEmptyBlock cannot fold other phi values if there are >1 shared predecessors.
Eventually, JumpThreading CANNOT further fold foo.exit97.thread6 into %foo.exit105's successor %foo.exit105.thread as the values can be derived from that branch are too complex ( {{0, 4294967296, 4294967297}, {4, 5, 7, 9}} ).
Maybe we can fold other phi values in such case, as follows:
foo.exit97.thread: ; preds = %i.i94, %i.i86
; %.sroa.0.ph = phi i64 [ 5, %i.i14 ], [ 7, %i.i ], [ 9, %i.i86 ], [ 4, %i.i94 ]
; %.sroa.1.ph = phi i64 [ 4294967297, %i.i14 ], [ 4294967296, %i.i ], [ 0, %i.i86 ], [ 0, %i.i94 ]
%.sroa.0.ph = phi i64 [ 9, %i.i86 ], [ 4, %i.i94 ]
%.sroa.1.ph = phi i64 [ 0, %i.i86 ], [ 0, %i.i94 ]
br label %foo.exit105
%foo.exit105:
_ = phi i64 ..., [ 5, %i.i14 ], [ 7, %i.i ], [%.sroa.0.ph, %foo.exit97.thread6]
_ = phi i64 ..., [ 4294967297, %i.i14 ], [ 4294967296, %i.i ], [%.sroa.1.ph, %foo.exit97.thread6]
...Should we do such optimization?
|
|
||
| ; Function Attrs: nounwind uwtable | ||
| define range(i32 -1, 2) i32 @Abc_NtkCheckConstant_rec(ptr noundef %0) local_unnamed_addr #0 { | ||
| define i32 @Abc_NtkCheckConstant_rec(ptr noundef %0) local_unnamed_addr #0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regression in SCCP: fails to derive the return range of such a recursive function.
The recursion expression is as follows:
The core reason is the monotony of SCCP analysis: using union (mergeInValue) to update the lattice state, so that the value state can only evolve in an increasing direction.
E.g., for predicate info, we infer a new range via
- Assume that initial
$\text{CR} = \bot$ ,$\text{ImposedCR} = [0, 2]$ (edge constraint), and$\text{CopyCR} = \bot$ (e.g., a ret value from a function call with unknown ret range). - Then we can infer that
$\text{CR} = [0, 2]$ - If we can infer the range of
$\text{CopyCR}$ after as$[-1, 1]$ , we expect a new$\text{CR}$ as$[0, 1]$ - However, as we use
$\cup$ to update the range, the final$\text{CR}$ unchanges.
Maybe, designated for constantrange, we can relax the monotony of range evolution.
Details
Considering such a simplified situation:
define i8 @foo() {
...
switch:
%ret = tail call i8 @foo()
switch i8 %ret, label %default [
i32 0, label %end
]
default:
br label %end
...
end:
%phi = phi i8 [ 0, %switch], [ %ret, %default ]
ret i8 %phi
}If the DFS order of SCCP is switch-> end -> default,
Before this patch, SCCP performed analysis as follows:
- Visit
switch:-
%ret = tail call i8 @foo()-->$\bot$ (unknown), as the ret range offoois unknown. - Mark edges
switch -> endandswitch -> defaultas feasible (markEdgeExecutable)
-
- Visit
end:-
%phi = phi i8 [ 0, %switch], [ %ret, %default ]->$[0,1)$ (constantrange), as edgedefault -> endis not feasible temporarily. -
ret i8 %phi-> set the ret range offooas$[0,1)$ - Add user
%ret = tail call i8 @foo()to worklist - Update
%retas$[0,1) = \bot \cup [0,1)$ , as old range is$\bot$ and the new range is$[0,1)$ .
-
- Visit
default:- Mark edge
default-> endas feasible (markEdgeExecutable) and push%phito worklist - Update
%phias union of0and%ret->$[0,1) \cup [0,1)= [0,1)$ - SCCP ends with
%phiunchanged.
- Mark edge
- The final ret range of
foois$[0,1)$
After this patch, SCCP performed analysis as follows:
- PredicateInfo:
- Insert noop predicate info
%ret.default = bitcast i8 %ret to i32for edgeswitch -> default
- Insert noop predicate info
- Visit
switch:-
%ret = tail call i8 @foo()-->$\bot$ (unknown), as the ret range offoois unknown. -
handlePredicateInfo:%ret.default = bitcast i8 %ret to i32->$[1, 0) = \bot \cup \big ([1,0) \cap \bot \big)$ =OrigCR ∪ (ImposedCR ∩ RetCR);ImposedCR = [1,0)meets the edge requirement. - Mark edges
switch -> endandswitch -> defaultas feasible (markEdgeExecutable)
-
- Visit
end:-
%phi = phi i8 [ 0, %switch], [ %ret.default, %default ]->[0,1)(constantrange), as edgedefault -> endis not feasible temporarily. -
ret i8 %phi-> set the ret range offooas[0,1) - Add user
%ret = tail call i8 @foo()to worklist - Update
%retas$[0,1) = \bot \cup [0,1)$ , as old range is$\bot$ and the new range is$[0,1)$ . - Add user
%ret.defaultto worklist to update it. -
handlePredicateInfo:%ret.default = bitcast i8 %ret to i32->$[1, 0) = [1,0) \cup \big ([1,0) \cap [0, 1) \big)$ =OrigCR ∪ (ImposedCR ∩ RetCR).
-
- Visit
default:- Mark edge
default-> endas feasible (markEdgeExecutable) and push%phito worklist - Update
%phias union of0and%ret->$[0,1) \cup [1,0)= \top$ - Update ret range of
fooas$\top$ , i.e.,overdefined - .... SCCP ends with
%phiunchanged.
- Mark edge
- The final ret range of
foois$\top$ , i.e.,overdefined.

Link: llvm/llvm-project#165258
Requested by: @Camsyn