Problems with Loops #89

JustusAdam · 2024-01-19T20:20:22Z

I found there to be issues with loops where the body of the loop is not connected to either the state before or after.

This PR provides a simple test case that illustrates the problem.

The user_data_ref variable isn't even found in the PDG at all and I believe neither is the call to push (the latter I only verified in the virtually identical Paralegal test case where I could dump both the PDG and the MIR to check which location corresponded to the call).

Jatkingmodern · 2025-08-27T19:06:36Z

What’s going wrong (most likely)

No control-dependence edges for loops
If PDG is built from plain def–use only (plus branch-to-true/false), the loop body ends up “floating” unless you add control dependences from the loop header’s branch to every instruction in the body that is not post-dominated by the header’s successors. Result: the loop body seems unconnected.

Broken SSA across back-edges (missing φ nodes)
Induction vars / refs created inside the loop must reach uses through φ. If φ nodes aren’t created/linked for i, user_data_ref, etc., the body’s uses have no incoming defs from the back-edge and get dropped.

Address-taken / reference values treated as “no-def”
user_data_ref = &i; creates a pointer/reference value (an address). If your PDG only tracks “register/SSA” defs but not address-taken/memory objects, the node may be eliminated. You need a memory object for i (or a byref temporary) and a def for &i (pointer-producing op).

Calls elided by missing side-effect summary
v.push(..) may not appear if:

you filter “pure” calls (but push is not pure), or

you never connect calls to the memory object for v (vector buffer / length / capacity).
Without MemorySSA / ModRef edges, the call node can be garbage-collected as “unused”.

Minimal failing test (close to your description)
fn f(n: usize) {
let mut v: Vec = Vec::new();
for i in 0..n {
let user_data_ref = &i;
v.push(*user_data_ref);
}
}

Expected PDG essentials

Nodes: alloc v, init Vec::new, loop header (cond), i φ, addr_of i, load *user_data_ref, call v.push, loop latch/increment.

Data deps:

i φ → addr_of i → load → call push(arg)

v (memory obj) ↔ call push (ModRef).

Control deps:

loop header branch → {all loop body instructions & latch}.

Fix plan (surgical)
A) Control-dependence for loops (must-have)

Compute post-dominators and control dependences (Ferrante–Ottenstein–Warren):

For a branch
𝐵
B with successors
𝑆
1
,
𝑆
2
S
1

,S
2

, any node
𝑁
N that is reachable from (say)
𝑆
1
S
1

and not post-dominated by both successors is control dependent on
𝐵
B.

Add PDG control edges from the loop header’s conditional branch to:

every instruction in the loop body,

the latch/increment block,

and to the back-edge target (usually the header) where appropriate.

Implementation sketch:

for (auto* br : function.branches()) {
for (auto* n : region_reachable_only_via_subset_of_successors(br)) {
pdg.addControlDep(br, n);
}
}

B) SSA across back-edges (value φ)

Build SSA before PDG or on-the-fly with renaming:

Insert φ for loop-carried defs (i, accumulators).

Connect incoming edges from preheader and latch to the φ; then φ → uses.

Ensure address-producing instructions (like &i) are modeled as real value defs.

Key rule: any value used in the loop body that changes per iteration needs either a φ or a MemorySSA chain.

C) Memory modeling (MemorySSA / ModRef)

Introduce MemorySSA nodes (MemoryDef/MemoryUse) for:

stores/loads of locals with their memory objects,

calls; attach a Mod/Ref summary per call.

For v.push(..), mark:

Mod of v’s buffer/len/capacity object,

Ref of the argument memory (if pointer/alias-able),

Control a conservative path first; refine later via alias analysis.

Minimal conservative solution:

auto* callNode = pdg.addCall(inst);
pdg.addMemEdge(v_memobj, callNode, Mod); // push mutates v
pdg.addMemEdge(callNode, v_memobj, Def); // result state of v

D) Keep call nodes (don’t dead-code-eliminate)

If you have a PDG “pruner”, never drop calls with Mod/Ref ≠ None.

When in doubt, keep all intraprocedural calls unless proven pure.

E) Ensure loop blocks are in CFG region walk

Some builders start from entry and stop at post-dominated regions—ensure back-edges don’t break your traversal. Always traverse full CFG and keep a visited set that allows revisits for back-edges during structural discovery but prevents infinite loops.

Quick pseudo-patch (illustrative C++-like steps)
void PDGBuilder::run(Function& F) {
// 1) CFG analyses
auto DT = computeDominators(F);
auto PDT = computePostDominators(F);
auto CDG = computeControlDependence(F, PDT); // Ferrante et al.

// 2) SSA / φ
promoteToSSA(F, DT); // value SSA (incl. loop-carried φ)
auto MSSA = buildMemorySSA(F); // MemorySSA for loads/stores/calls

// 3) Create PDG nodes
for (auto& BB : F) for (auto& I : BB) {
Node* N = pdg.createNode(&I);
if (isAddressOf(I)) pdg.markAsValueDef(N);
if (isCall(I)) {
auto modref = summarizeCallModRef(I); // conservative: ModRef on &v
attachMemoryEdgesFromSummary(N, modref, MSSA);
}
connectValueDefsUses(N, I); // SSA-based
}

// 4) Control deps
for (auto [src, targets] : CDG) {
for (auto* T : targets) pdg.addControlDep(pdg.nodeOf(src), pdg.nodeOf(T));
}

// 5) Keep side-effectful calls
pdg.keepSideEffectfulCalls();
}

Corner cases to include

&mut vs & references → different memory effects.

Inlined push vs method call → normalize to a call node or summary node.

Loops with break/continue → control deps from those branches too.

Sanity checks (add to your PR test)

Presence assertions:

PDG has node for the push call.

PDG has a node for addr_of i (or equivalent MIR op for &i).

There is a control edge from loop header branch → push node.

Path checks:

There is a data path: i φ → addr_of i → load → push(arg).

There is a memory path: v_memobj ↔ push (Mod/Def/Ref).

Example (pseudo) test API:

assert!(pdg.contains_node("call Vec::push"));
assert!(pdg.path_exists("phi i", "call Vec::push", Kind::Data));
assert!(pdg.control_dep_exists("loop_header_br", "call Vec::push"));
assert!(pdg.mem_edge_exists("v_mem", "call Vec::push", Mod));

Parallelizing the pass (safe + simple)

PDG construction has global dependencies (dom/postdom, MemorySSA), but you can parallelize safely:

Parallel per function / SCC
Run full PDG build for each function in a thread pool (no shared state except read-only module tables).

Within a function: stage parallelism

Stage A (sequential): compute DT, PDT, MemorySSA.

Stage B (parallel): create instruction nodes & local def–use:

Partition basic blocks into groups (e.g., by reverse postorder strata).

Each worker scans its blocks, creates nodes, records local edges into a thread-local buffer.

Stage C (sequential merge): merge edges; add control deps; finalize call summaries.

Skeleton:

ThreadPool pool;
for (auto* Fn : module.functions()) {
pool.enqueue([this, Fn] {
auto DT = computeDominators(*Fn);
auto PDT = computePostDominators(*Fn);
auto MSSA = buildMemorySSA(*Fn);
parallel_for(block_groups(*Fn), [&](auto& group){
build_nodes_and_local_edges(group, MSSA);
});
merge_and_finalize_control_deps(PDT);
});
}
pool.wait();

Data structures

Use thread-local vectors for edges, then a single merge.

Use concurrent hash map only for instruction→node mapping; or pre-assign node IDs in a first pass to avoid contention.

Why this will fix your specific symptoms

Loop body “not connected” → adding control dependences from the loop header branch and φ-propagation stitches it back.

user_data_ref missing → model address-of and reference as value defs; attach to MemorySSA object of i if needed.

push missing → keep side-effectful calls + ModRef edges to v’s memory object ensure the call remains and is visible in PDG queries.

If you can share the tiny MIR/IR of your failing case, I can map exact instruction IDs → PDG nodes and tailor the control-dependence set for your CFG shape (esp. preheader, header, body, latch).

Loop problems test case

03cdeda

JustusAdam marked this pull request as ready for review January 19, 2024 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems with Loops #89

Problems with Loops #89

Uh oh!

JustusAdam commented Jan 19, 2024

Uh oh!

Jatkingmodern commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Problems with Loops #89

Are you sure you want to change the base?

Problems with Loops #89

Uh oh!

Conversation

JustusAdam commented Jan 19, 2024

Uh oh!

Jatkingmodern commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants