perf(verification): shrink and speed up SMT verification conditions#408
Closed
alcides wants to merge 1 commit into
Closed
perf(verification): shrink and speed up SMT verification conditions#408alcides wants to merge 1 commit into
alcides wants to merge 1 commit into
Conversation
Three improvements to the liquid-type verification path, all behaviour-preserving on the standard (GIL) build (200+ SMT/typechecking tests pass): 1. flatten: O(depth^2) -> O(depth) binder alpha-renaming. The eager per-binder rename rebuilt the whole remaining constraint at every forall, making substitution_in_liquid the hottest function in the system. Renamings are now deferred and applied in one pass per premise/conclusion. ~2.2x faster flatten on deep constraints; verified structurally identical to the old output over 300+ real obligations (tests/flatten_rename_test.py). 2. VC simplification before Z3 (smt._simplify_vc): equality elimination + relevance slicing. Equality elimination (exact) substitutes x:=Y for premises x==Y and drops the binder, collapsing ANF h_i==e_i chains. Relevance slicing keeps only premises/functions/variables transitively connected to the goal, removing irrelevant `open`-library declarations. 3.48x faster validation on deep-constraint programs (supermario's 3.5M-char obligation 3.6s -> 1.1s); 0 verdict changes over 4,460 obligations across 44 programs. Slicing is always sound; see the docstring for the narrow disconnected-inconsistent-premise completeness caveat. 3. Thread-local Z3 context (smt._WS) + lock-atomic fresh_counter: makes the SMT layer safe for parallel candidate validation under free-threaded CPython (python3.14t). The main thread keeps the default Z3 context, so external importers and GIL-build behaviour are unchanged. Enables ~1.4-2x parallel validation of synthesis populations on 3.14t (the parallel driver itself is not yet wired in). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three improvements to the liquid-type verification path (the SMT bottleneck that dominates synthesis time), all behaviour-preserving on the standard (GIL) build:
flatten: O(depth²) → O(depth) alpha-renaming. The eager per-binder rename rebuilt the entire remaining constraint at every∀, makingsubstitution_in_liquidthe single hottest function in the system. Renamings are now deferred and applied in one pass per premise/conclusion. ~2.2× fasterflattenon deep constraints; verified structurally identical to the old output over 300+ real obligations.VC simplification before Z3 (
_simplify_vc): equality elimination + relevance slicing.x == Ywithxa universally-quantified binder andYfree ofxpinsx, so substitutex := Yand drop both. Collapses the ANFh_i == e_ichains.open-library declarations that bloat VCs.Thread-local Z3 context (
_WS) + lock-atomicfresh_counter. Makes the SMT layer safe for parallel candidate validation under free-threaded CPython (python3.14t). The main thread keeps the default Z3 context, so external importers and GIL-build behaviour are byte-identical. Enables ~1.4–2× parallel validation of synthesis populations on 3.14t. (The parallel driver itself is not yet wired into the GP loop — this lands the infrastructure + correctness.)Motivating example:
supermariosupermario's typecheck is dominated by one obligation — the synthesis-target VC, which accumulates the entireopen Arraylibrary + inductive context into a single 3.5M-char formula. Dissected:After #1+#2 that obligation drops 3604 ms → 1112 ms (3.2×).
How fast is synthesis in the different backends
Throughput per backend, standard (GIL) build,
seed=1, 15 s budget, with these changes in place:even_parity.ae(boolean GP benchmark)koza_quartic.ae(symbolic regression, deeper float constraints)The backends bottleneck on the same validation path, so the relative ordering is driven by candidate-generation cost (enumerative is cheap-per-candidate on shallow boolean goals but expensive on deep ones). The optimizations accelerate the validation phase shared by all of them — small for shallow goals like
even_parity, large (up to ~3.5×) for deep-constraint synthesis (lists, ADTs,supermario-style).Soundness & testing
tests/flatten_rename_test.py,tests/alpha_key_test.py.Caveats (please review)
🤖 Generated with Claude Code