Skip to content

Conversation

@zyw-bot
Copy link
Collaborator

@zyw-bot zyw-bot commented Oct 31, 2025

Link: llvm/llvm-project#165258
Requested by: @Camsyn

@github-actions github-actions bot mentioned this pull request Oct 31, 2025
@zyw-bot
Copy link
Collaborator Author

zyw-bot commented Oct 31, 2025

Diff mode

runner: ariselab-64c-docker
baseline: llvm/llvm-project@511c9c0
patch: llvm/llvm-project#165258
sha256: c92a254e91f8790b15e188e6ce95b60c0239a21056effe25e9e0bd537083bef6
commit: 88be3e6

8147 files changed, 1318112 insertions(+), 1323505 deletions(-)

Improvements:
  mem2reg.NumPromoted 70917 -> 71311 +0.56%
  argpromotion.NumArgumentsDead 476619 -> 478442 +0.38%
  function-attrs.NumReadNoneArg 836667 -> 838484 +0.22%
  sccp.NumDeadBlocks 685312 -> 686614 +0.19%
  argpromotion.NumArgumentsPromoted 699786 -> 701113 +0.19%
  function-attrs.NumReadOnlyArg 1881577 -> 1885029 +0.18%
  function-attrs.NumCapturesNone 3404731 -> 3409094 +0.13%
  bdce.NumRemoved 383814 -> 384086 +0.07%
  instcombine.NumSunkInst 3402777 -> 3404705 +0.06%
  globaldce.NumFunctions 348446 -> 348636 +0.05%
Regressions:
  correlated-value-propagation.NumDeadCases 65942 -> 56499 -14.32%
  correlated-value-propagation.NumSExt 47306 -> 46990 -0.67%
  correlated-value-propagation.NumAShrsConverted 3484 -> 3465 -0.55%
  sccp.NumInstReplaced 132887 -> 132233 -0.49%
  correlated-value-propagation.NumNNeg 96021 -> 95768 -0.26%
  simplifycfg.NumSinkCommonCode 378565 -> 377679 -0.23%
  correlated-value-propagation.NumCmps 272160 -> 271564 -0.22%
  local.NumRemoved 5303713 -> 5293801 -0.19%
  correlated-value-propagation.NumUDivURemsNarrowedExpanded 1674 -> 1671 -0.18%
  adce.NumRemoved 95338 -> 95174 -0.17%

+6 llvm/Target.ll
+3 wireshark/packet-rf4ce-nwk.ll
+3 z3/expr_pattern_match.ll
+2 abc/abcSweep.ll
+1 casadi/rootfinder.ll
+1 hdf5/H5Olink.ll
+1 libjpeg-turbo/rdjpgcom.ll
+1 libjpeg-turbo/tjunittest.ll
+1 llama.cpp/llama-vocab.ll
+0 cxxopts/example.ll
+0 fish-rs/e69mx4kebbw5h90l2bpw0bwyt.ll
+0 gromacs/membed.ll
+0 lz4/lz4cli.ll
+0 tinyrenderer/model.ll
-1 abc/amapParse.ll
-1 abc/verFormula.ll
-1 arrow/string-to-double.ll
-1 box2d/imgui_demo.ll
-1 cpython/socketmodule.ll
-1 csmith/FunctionInvocationBinary.ll
-1 ffmpeg/hevcdec.ll
-1 ffmpeg/rv34.ll
-1 icu/double-conversion-string-to-double.ll
-1 linux/fault.ll
-1 mitsuba3/plastic.ll
-1 opencv/array.ll
-1 opencv/ts_perf.ll
-1 openusd/string-to-double.ll
-1 php/zend_ini_scanner.ll
-1 postgres/gindatapage.ll
-1 postgres/pl_exec.ll
-1 postgres/type.ll
-1 quiche-rs/6lp2oyapnsojevo64mk9ap806.ll
-1 ruby/date_parse.ll
-1 stockfish/uci.ll
-1 typst-rs/3kgmqnxcsl3z3n0n.ll
-1 wireshark/packet-ansi_637.ll
-1 zed-rs/dw4qzuo904yf8wu71sutofhxl.ll
-2 abc/ioUtil.ll
-2 icu/number_patternstring.ll
-2 linux/locks.ll
-2 nix/print-ambiguous.ll
-2 openjdk/jvmFlag.ll
-2 zed-rs/20igqmfettcex48uahr8huyna.ll
-2 zed-rs/2g6g1uvat5pik6wc3r3hl3kr7.ll
-3 cmake/zstd_compress.ll
-3 duckdb/zstd_compress_superblock.ll
-3 linux/xhci-debugfs.ll
-3 meshoptimizer/vertexcodec.ll
-3 ruff-rs/1t5d2y321zgutphrasyamrpjz.ll
-3 rustfmt-rs/3xcdaapyewyrfogi.ll
-3 zstd/zstd_compress.ll
-4 cpython/listobject.ll
-4 luau/Quantify.ll
-4 meilisearch-rs/48hhebymxr5ff2nk.ll
-5 flatbuffers/idl_gen_kotlin_kmp.ll
-5 openssl/quic_stream_map.ll
-5 openssl/rsa_ameth.ll
-5 postgres/jsonfuncs.ll
-5 quantlib/fdklugeextouspreadengine.ll
-6 llvm/InstrProfiling.ll
-6 llvm/Program.ll
-6 typst-rs/1ru1rhojhbz2vfey.ll
-6 typst-rs/59tuvc5m3xlovl3o.ll
-8 wasmtime-rs/24jxjxhx40nukvhl.ll
-10 hermes/JSObject.ll
-10 ruff-rs/9ezhgv3vaoku7b96fwwr4f701.ll
-10 rust-analyzer-rs/hf9vzunhg9aziex.ll
-11 c3c/sema_casts.ll
-16 pola-rs/dgtr4n6toyrs0lo6gtn8sd4wk.ll
-19 c3c/types.ll
-20 lief/bignum.ll
-27 hermes/RegexParser.ll
-28 diesel-rs/32aaw0bzsmxs81tm.ll
-55 z3/realclosure.ll
-56 diesel-rs/285i4t9uy6n6phhi.ll
-64 minetest/serialization.ll

@github-actions
Copy link
Contributor

  1. Control Flow Simplification: Several functions replace complex conditional checks with simpler icmp comparisons and direct branching, improving code clarity and potentially performance by reducing unnecessary operations.

  2. Phi Node Adjustments: Multiple phi nodes in loops and exit blocks are updated to reflect changes in control flow, such as redirecting incoming blocks or simplifying value selection, ensuring correct dominance and SSA form after structural changes.

  3. Switch Statement Optimization: Some switch statements are streamlined by removing unreachable cases or consolidating duplicate labels, reducing branch overhead and enhancing readability.

  4. Memory Operation Refinement: Stores and loads are adjusted for better alignment and precision (e.g., replacing wide stores with narrower ones), and redundant memory accesses are eliminated through improved aliasing and scope metadata.

  5. Function Signature Updates: Certain function declarations now include range attributes on parameters, providing more precise semantic information to the optimizer, which can enable better optimization decisions.

model: qwen-plus-latest
CompletionUsage(completion_tokens=187, prompt_tokens=107175, total_tokens=107362, completion_tokens_details=None, prompt_tokens_details=None)

_ZN4llvm12StringSwitchINS_5MachO12PlatformTypeES2_E4CaseENS_13StringLiteralES2_.exit97.thread: ; preds = %_ZN4llvmeqENS_9StringRefES0_.exit.i.i, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i14, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i94, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i86
%.sroa.30.11.ph = phi i64 [ 4294967297, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i14 ], [ 4294967296, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i ], [ 0, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i86 ], [ 0, %_ZN4llvmeqENS_9StringRefES0_.exit.i.i94 ]
br label %_ZN4llvm12StringSwitchINS_5MachO12PlatformTypeES2_E4CaseENS_13StringLiteralES2_.exit105

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression?

This change is introduced by different inputs to jump-threading (I am trying to figure out what happens).
And the input difference is introduced by SCCP folding a conditional br to uncond br, as follows:
image

Copy link

@Camsyn Camsyn Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the reason: it seems to be the missed optimization of TryToSimplifyUncondBranchFromEmptyBlock, which bails out and cannot fold other phi values in BB into the successor Succ if there are >1 shared predecessors between BB and Succ.

Details


The original JumpThreading generates the following block:

foo.exit97.thread6:                               ; preds = %i.i, %i.i14, %i.i86
  %.sroa.10.ph = phi i64 [ 5, %i.i14 ], [ 7, %i.i ], [ 9, %i.i86 ]
  %.sroa.30.10.ph = phi i64 [ 4294967297, %i.i14 ], [ 4294967296, %i.i ], [ 0, %i.i86 ]
  br label %foo.exit105
%foo.exit105:
  ...

While there is only ONE common predecessor (i.e., %i.i86) between foo.exit97.thread6 and its successor %foo.exit105, TryToSimplifyUncondBranchFromEmptyBlock can fold other phi node values into %foo.exit105, as follows:

foo.exit97.thread6:                               ; preds = %i.i86
  br label %foo.exit105
%foo.exit105:
  _ = phi i64 ..., [ 5, %i.i14 ], [ 7, %i.i ], [9, %foo.exit97.thread6]
  _ = phi i64 ..., [ 4294967297, %i.i14 ], [ 4294967296, %i.i ], [0, %foo.exit97.thread6]
  ...

Eventually, with the simplified branch values of {{0}, {9}}, JumpThreading can further fold foo.exit97.thread6 into %foo.exit105's successor %foo.exit105.thread as follows:

BB 'foo.exit105': FOUND condition = i1 true for pred 'foo.exit97.thread6'.
Threading edge from 'foo.exit97.thread6' to 'foo.exit105.thread', across block: foo.exit105

The enhancement of SCCP makes JumpThreading optimize more, generating the following block (ONE more block of %i.i94 merged):

foo.exit97.thread:                                ; preds = %i.i, %i.i14, %i.i94, %i.i86
  %.sroa.0.ph = phi i64 [ 5, %i.i14 ], [ 7, %i.i ], [ 9, %i.i86 ], [ 4, %i.i94 ]
  %.sroa.1.ph = phi i64 [ 4294967297, %i.i14 ], [ 4294967296, %i.i ], [ 0, %i.i86 ], [ 0, %i.i94 ]
  br label %foo.exit105
%foo.exit105:
  ...

%i.i94 is also a shared predecessor between %foo.exit97.thread and %foo.exit105, leading to there are TWO shared predecessors (%i.i94 and %i.i86).
However, currently, TryToSimplifyUncondBranchFromEmptyBlock cannot fold other phi values if there are >1 shared predecessors.

Eventually, JumpThreading CANNOT further fold foo.exit97.thread6 into %foo.exit105's successor %foo.exit105.thread as the values can be derived from that branch are too complex ( {{0, 4294967296, 4294967297}, {4, 5, 7, 9}} ).


Maybe we can fold other phi values in such case, as follows:

foo.exit97.thread:                                ; preds = %i.i94, %i.i86
  ; %.sroa.0.ph = phi i64 [ 5, %i.i14 ], [ 7, %i.i ], [ 9, %i.i86 ], [ 4, %i.i94 ]
  ; %.sroa.1.ph = phi i64 [ 4294967297, %i.i14 ], [ 4294967296, %i.i ], [ 0, %i.i86 ], [ 0, %i.i94 ]
  %.sroa.0.ph = phi i64 [ 9, %i.i86 ], [ 4, %i.i94 ]
  %.sroa.1.ph = phi i64 [ 0, %i.i86 ], [ 0, %i.i94 ]
  br label %foo.exit105
%foo.exit105:
  _ = phi i64 ..., [ 5, %i.i14 ], [ 7, %i.i ], [%.sroa.0.ph, %foo.exit97.thread6]
  _ = phi i64 ..., [ 4294967297, %i.i14 ], [ 4294967296, %i.i ],  [%.sroa.1.ph, %foo.exit97.thread6]
  ...

Should we do such optimization?


; Function Attrs: nounwind uwtable
define range(i32 -1, 2) i32 @Abc_NtkCheckConstant_rec(ptr noundef %0) local_unnamed_addr #0 {
define i32 @Abc_NtkCheckConstant_rec(ptr noundef %0) local_unnamed_addr #0 {
Copy link

@Camsyn Camsyn Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression in SCCP: fails to derive the return range of such a recursive function.

The recursion expression is as follows:
$R^{k+1} = [-1, 2) \cup R^k$
$R^0 = [-1, 2)$
$k$ means the recursion depth, and obviously, the range $R^*$ should have been inferred to $[-1, 2)$


The core reason is the monotony of SCCP analysis: using union (mergeInValue) to update the lattice state, so that the value state can only evolve in an increasing direction.

E.g., for predicate info, we infer a new range via $\text{CR} = \text{CR} \cup ( \text{ImposedCR} \cap \text{CopyCR} )$.

  1. Assume that initial $\text{CR} = \bot$ , $\text{ImposedCR} = [0, 2]$ (edge constraint), and $\text{CopyCR} = \bot$ (e.g., a ret value from a function call with unknown ret range).
  2. Then we can infer that $\text{CR} = [0, 2]$
  3. If we can infer the range of $\text{CopyCR}$ after as $[-1, 1]$, we expect a new $\text{CR}$ as $[0, 1]$
  4. However, as we use $\cup$ to update the range, the final $\text{CR}$ unchanges.

Maybe, designated for constantrange, we can relax the monotony of range evolution.


Details

Considering such a simplified situation:

define i8 @foo() {
...
switch:
  %ret = tail call i8 @foo()
  switch i8 %ret, label %default [
    i32 0, label %end
  ]
default:
  br label %end
...
end:
  %phi = phi i8 [ 0, %switch],  [ %ret, %default ]
  ret i8 %phi
}

If the DFS order of SCCP is switch-> end -> default,

Before this patch, SCCP performed analysis as follows:

  1. Visit switch:
    1. %ret = tail call i8 @foo() --> $\bot$ (unknown ), as the ret range of foo is unknown.
    2. Mark edges switch -> end and switch -> default as feasible (markEdgeExecutable)
  2. Visit end:
    1. %phi = phi i8 [ 0, %switch], [ %ret, %default ] -> $[0,1)$ ( constantrange), as edge default -> end is not feasible temporarily.
    2. ret i8 %phi -> set the ret range of foo as $[0,1)$
    3. Add user %ret = tail call i8 @foo() to worklist
    4. Update %ret as $[0,1) = \bot \cup [0,1)$, as old range is $\bot$ and the new range is $[0,1)$.
  3. Visit default:
    1. Mark edge default-> end as feasible (markEdgeExecutable) and push %phi to worklist
    2. Update %phi as union of 0 and %ret -> $[0,1) \cup [0,1)= [0,1)$
    3. SCCP ends with %phi unchanged.
  4. The final ret range of foo is $[0,1)$

After this patch, SCCP performed analysis as follows:

  1. PredicateInfo:
    1. Insert noop predicate info %ret.default = bitcast i8 %ret to i32 for edge switch -> default
  2. Visit switch:
    1. %ret = tail call i8 @foo() --> $\bot$ (unknown ), as the ret range of foo is unknown.
    2. handlePredicateInfo: %ret.default = bitcast i8 %ret to i32 -> $[1, 0) = \bot \cup \big ([1,0) \cap \bot \big)$ = OrigCR ∪ (ImposedCR ∩ RetCR) ; ImposedCR = [1,0) meets the edge requirement.
    3. Mark edges switch -> end and switch -> default as feasible (markEdgeExecutable)
  3. Visit end:
    1. %phi = phi i8 [ 0, %switch], [ %ret.default, %default ] -> [0,1) ( constantrange), as edge default -> end is not feasible temporarily.
    2. ret i8 %phi -> set the ret range of foo as [0,1)
    3. Add user %ret = tail call i8 @foo() to worklist
    4. Update %ret as $[0,1) = \bot \cup [0,1)$, as old range is $\bot$ and the new range is $[0,1)$.
    5. Add user %ret.default to worklist to update it.
    6. handlePredicateInfo: %ret.default = bitcast i8 %ret to i32 -> $[1, 0) = [1,0) \cup \big ([1,0) \cap [0, 1) \big)$ = OrigCR ∪ (ImposedCR ∩ RetCR).
  4. Visit default:
    1. Mark edge default-> end as feasible (markEdgeExecutable) and push %phi to worklist
    2. Update %phi as union of 0 and %ret -> $[0,1) \cup [1,0)= \top$
    3. Update ret range of foo as $\top$, i.e., overdefined
    4. .... SCCP ends with %phi unchanged.
  5. The final ret range of foo is $\top$, i.e., overdefined.

@dtcxzyw dtcxzyw closed this Dec 10, 2025
@dtcxzyw dtcxzyw deleted the test-run18974668236 branch December 10, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants