perf(verification): shrink and speed up SMT verification conditions by alcides · Pull Request #408 · alcides/aeon

alcides · 2026-06-21T17:52:36Z

Summary

Three improvements to the liquid-type verification path (the SMT bottleneck that dominates synthesis time), all behaviour-preserving on the standard (GIL) build:

flatten: O(depth²) → O(depth) alpha-renaming. The eager per-binder rename rebuilt the entire remaining constraint at every ∀, making substitution_in_liquid the single hottest function in the system. Renamings are now deferred and applied in one pass per premise/conclusion. ~2.2× faster flatten on deep constraints; verified structurally identical to the old output over 300+ real obligations.
VC simplification before Z3 (_simplify_vc): equality elimination + relevance slicing.
- Equality elimination (exact): a premise x == Y with x a universally-quantified binder and Y free of x pins x, so substitute x := Y and drop both. Collapses the ANF h_i == e_i chains.
- Relevance slicing: keep only premises/functions/variables transitively connected to the goal — removes the irrelevant open-library declarations that bloat VCs.
- 3.48× faster validation on deep-constraint programs; 0 verdict changes over 4,460 obligations across 44 programs.
Thread-local Z3 context (_WS) + lock-atomic fresh_counter. Makes the SMT layer safe for parallel candidate validation under free-threaded CPython (python3.14t). The main thread keeps the default Z3 context, so external importers and GIL-build behaviour are byte-identical. Enables ~1.4–2× parallel validation of synthesis populations on 3.14t. (The parallel driver itself is not yet wired into the GP loop — this lands the infrastructure + correctness.)

Motivating example: `supermario`

supermario's typecheck is dominated by one obligation — the synthesis-target VC, which accumulates the entire open Array library + inductive context into a single 3.5M-char formula. Dissected:

flatten   484 ms   (15%)   ← the O(depth²) renaming (fix #1)
translate 2333 ms  (73%)   ← building Z3 ASTs for a mostly-irrelevant formula (fix #2)
z3 solve     1 ms  (~0%)   ← it was never a hard solve

After #1+#2 that obligation drops 3604 ms → 1112 ms (3.2×).

How fast is synthesis in the different backends

Throughput per backend, standard (GIL) build, seed=1, 15 s budget, with these changes in place:

even_parity.ae (boolean GP benchmark)

backend	candidates	cand/s
gp	706	43
random_search	655	43
enumerative	1345	89
hc	656	43
1p1	654	43

koza_quartic.ae (symbolic regression, deeper float constraints)

backend	candidates	cand/s
gp	359	21
random_search	308	20
enumerative	156	10
hc	311	20
1p1	306	20

The backends bottleneck on the same validation path, so the relative ordering is driven by candidate-generation cost (enumerative is cheap-per-candidate on shallow boolean goals but expensive on deep ones). The optimizations accelerate the validation phase shared by all of them — small for shallow goals like even_parity, large (up to ~3.5×) for deep-constraint synthesis (lists, ADTs, supermario-style).

The factory-routed backends (synquid, sygus, afta, cata) couldn't be benchmarked here: this machine's scipy wheel fails to dlopen (macOS __thread_bss), which breaks the CLI's decision_tree import. The five geneticengine backends run via the test harness, which bypasses it.

Soundness & testing

200+ single-threaded SMT/typechecking tests pass. New regression tests: tests/flatten_rename_test.py, tests/alpha_key_test.py.
Verdict differential: 0 mismatches over 4,460 obligations / 44 programs (simplification on vs off).
Parallel == serial verified for the thread-local path across every measured workload.
mypy + ruff clean.

Caveats (please review)

Relevance slicing is always sound (dropping premises only weakens the hypothesis — it can never make an invalid program typecheck) but is theoretically incomplete in one narrow case: a premise that is both disconnected from the goal and internally contradictory (a dead-code refinement context) would vacuously discharge the goal, and dropping it could turn that into a false reject. Not observed across the 4,460-obligation differential. A "re-check the full VC on failure" fallback was deliberately omitted (during synthesis most candidates are invalid, so it would erase the speedup). If preferred, slicing can be gated to the synthesis path only, keeping the user-facing typechecker provably complete — happy to do that.
The free-threading refactor (User-defined types are not BasicTypes #3) is infrastructure; the parallel GP driver is a follow-up.

🤖 Generated with Claude Code

Three improvements to the liquid-type verification path, all behaviour-preserving on the standard (GIL) build (200+ SMT/typechecking tests pass): 1. flatten: O(depth^2) -> O(depth) binder alpha-renaming. The eager per-binder rename rebuilt the whole remaining constraint at every forall, making substitution_in_liquid the hottest function in the system. Renamings are now deferred and applied in one pass per premise/conclusion. ~2.2x faster flatten on deep constraints; verified structurally identical to the old output over 300+ real obligations (tests/flatten_rename_test.py). 2. VC simplification before Z3 (smt._simplify_vc): equality elimination + relevance slicing. Equality elimination (exact) substitutes x:=Y for premises x==Y and drops the binder, collapsing ANF h_i==e_i chains. Relevance slicing keeps only premises/functions/variables transitively connected to the goal, removing irrelevant `open`-library declarations. 3.48x faster validation on deep-constraint programs (supermario's 3.5M-char obligation 3.6s -> 1.1s); 0 verdict changes over 4,460 obligations across 44 programs. Slicing is always sound; see the docstring for the narrow disconnected-inconsistent-premise completeness caveat. 3. Thread-local Z3 context (smt._WS) + lock-atomic fresh_counter: makes the SMT layer safe for parallel candidate validation under free-threaded CPython (python3.14t). The main thread keeps the default Z3 context, so external importers and GIL-build behaviour are unchanged. Enables ~1.4-2x parallel validation of synthesis populations on 3.14t (the parallel driver itself is not yet wired in). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alcides closed this Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(verification): shrink and speed up SMT verification conditions#408

perf(verification): shrink and speed up SMT verification conditions#408
alcides wants to merge 1 commit into
masterfrom
claude/synthesis-vc-speedups

alcides commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alcides commented Jun 21, 2026

Summary

Motivating example: supermario

How fast is synthesis in the different backends

Soundness & testing

Caveats (please review)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Motivating example: `supermario`