|
| 1 | +# Parallel project notes |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | +We track notes for updates to |
| 6 | +[smt/parallel.cpp](https://github.com/Z3Prover/z3/blob/master/src/smt/smt_parallel.cpp) |
| 7 | +and possibly |
| 8 | +[solver/parallel_tactic.cpp](https://github.com/Z3Prover/z3/blob/master/src/solver/parallel_tactical.cpp). |
| 9 | + |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | + |
| 14 | +## Variable selection heuristics |
| 15 | + |
| 16 | + |
| 17 | + |
| 18 | +* Lookahead solvers: |
| 19 | + * lookahead in the smt directory performs a simplistic lookahead search using unit propagation. |
| 20 | + * lookahead in the sat directory uses custom lookahead solver based on MARCH. March is described in Handbook of SAT and Knuth volumne 4. |
| 21 | + * They both proxy on a cost model where the most useful variable to branch on is the one that _minimizes_ the set of new clauses maximally |
| 22 | + through unit propagation. In other words, if a literal _p_ is set to true, and _p_ occurs in clause $\neg p \vee q \vee r$, then it results in |
| 23 | + reducing the clause from size 3 to 2 (because $\neg p$ will be false after propagating _p_). |
| 24 | + * Selected references: SAT handbook, Knuth Volumne 4, Marijn's March solver on github, [implementation of march in z3](https://github.com/Z3Prover/z3/blob/master/src/sat/sat_lookahead.cpp) |
| 25 | +* VSIDS: |
| 26 | + * As referenced in Matteo and Antti's solvers. |
| 27 | + * Variable activity is a proxy for how useful it is to case split on a variable during search. Variables with a higher VSIDS are split first. |
| 28 | + * VSIDS is updated dynamically during search. It was introduced in the paper with Moscovitz, Malik, et al in early 2000s. A good overview is in Armin's tutorial slides (also in my overview of SMT). |
| 29 | + * VSIDS does not keep track of variable phases (if the variable was set to true or false). |
| 30 | + * Selected refernces [DAC 2001](https://www.princeton.edu/~chaff/publication/DAC2001v56.pdf) and [Biere Tutorial, slide 64 on Variable Scoring Schemes](https://alexeyignatiev.github.io/ssa-school-2019/slides/ab-satsmtar19-slides.pdf) |
| 31 | +* Proof prefix: |
| 32 | + * Collect the literals that occur in learned clauses. Count their occurrences based on polarity. This gets tracked in a weighted score. |
| 33 | + * The weight function can be formulated to take into account clause sizes. |
| 34 | + * The score assignment may also decay similar to VSIDS. |
| 35 | + * We could also use a doubly linked list for literals used in conflicts and keep reinsert literals into the list when they are used. This would be a "Variable move to front" (VMTF) variant. |
| 36 | + * Selected references: [Battleman et al](https://www.cs.cmu.edu/~mheule/publications/proofix-SAT25.pdf) |
| 37 | +* From local search: |
| 38 | + * Note also that local search solvers can be used to assign variable branch priorities. |
| 39 | + * We are not going to directly run a local search solver in the mix up front, but let us consider this heuristic for completeness. |
| 40 | + * The heuristic is documented in Biere and Cai's journal paper on integrating local search for CDCL. |
| 41 | + * Roughly, it considers clauses that move from the UNSAT set to the SAT set of clauses. It then keeps track of the literals involved. |
| 42 | + * Selected references: [Cai et al](https://www.jair.org/index.php/jair/article/download/13666/26833/) |
| 43 | +* Assignment trails: |
| 44 | + * We could also consider the assignments to variables during search. |
| 45 | + * Variables that are always assigned to the same truth value could be considered to be safe to assign that truth value. |
| 46 | + * The cubes resulting from such variables might be a direction towards finding satisfying solutions. |
| 47 | + * Selected references: [Alex and Vadim](https://link.springer.com/chapter/10.1007/978-3-319-94144-8_7) and most recently [Robin et al](https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SAT.2024.9). |
| 48 | + |
| 49 | + |
| 50 | +## Algorithms |
| 51 | + |
| 52 | +This section considers various possible algorithms. |
| 53 | +In the following, $F$ refers to the original goal, $T$ is the number of CPU cores or CPU threads. |
| 54 | + |
| 55 | +### Base algorithm |
| 56 | + |
| 57 | +The existing algorithm in <b>smt_parallel</b> is as follows: |
| 58 | + |
| 59 | +1. Run a solver on $F$ with a bounded number of conflicts. |
| 60 | +2. If the result is SAT/UNSAT, or UNKNOWN with an interrupt or timeout, return. If the maximal number of conflicts were reached continue. |
| 61 | +3. Spawn $T$ solvers on $F$ with a bounded number of conflicts, wait until a thread returns UNSAT/SAT or all threads have reached a maximal number of conflicts. |
| 62 | +4. Perform a similar check as in 2. |
| 63 | +5. Share unit literals learned by each thread. |
| 64 | +6. Compute unit cubes for each thread $T$. |
| 65 | +7. Spawn $T$ solvers with $F \wedge \ell$, where $\ell$ is a unit literal determined by lookahead function in each thread. |
| 66 | +8. Perform a similar check as in 2. But note that a thread can be UNSAT because the unit cube $\ell$ contradicted $F$. In this case learn the unit literal $\neg \ell$. |
| 67 | +9. Shared unit literals learned by each thread, increase the maximal number of conflicts, go to 3. |
| 68 | + |
| 69 | +### Algorithm Variants |
| 70 | + |
| 71 | +* Instead of using lookahead solving to find unit cubes use the proof-prefix based scoring function. |
| 72 | +* Instead of using independent unit cubes, perform a systematic (where systematic can mean many things) cube and conquer strategy. |
| 73 | +* Spawn some threads to work in "SAT" mode, tuning to find models instead of short resolution proofs. |
| 74 | +* Change the synchronization barrier discipline. |
| 75 | +* [Future] Include in-processing |
| 76 | + |
| 77 | +### Cube and Conquer strategy |
| 78 | + |
| 79 | +We could maintain a global decomposition of the search space by maintaing a list of _cubes_. |
| 80 | +Initially, the list of cubes has just one element, the cube with no literals $[ [] ]$. |
| 81 | +By using a list of cubes instead of a _set_ of cubes we can refer to an ordering. |
| 82 | +For example, cubes can be ordered by a suffix traversal of the _cube tree_ (the tree formed by |
| 83 | +case splitting on the first literal, children of the _true_ branch are the cubes where the first |
| 84 | +literal is true, children of the _false_ branch are the cubes where the first literal is false). |
| 85 | + |
| 86 | +The main question is going to be how the cube decomposition is created. |
| 87 | + |
| 88 | +#### Static cubing |
| 89 | +We can aim for a static cube strategy that uses a few initial (concurrent) probes to find cube literals. |
| 90 | +This strategy would be a parallel implementaiton of proof-prefix approach. The computed cubes are inserted |
| 91 | +into the list of cubes and the list is consumed by a second round. |
| 92 | + |
| 93 | +#### Growing cubes on demand |
| 94 | +Based on experiences with cubing so far, there is high variance in how easy cubes are to solve. |
| 95 | +Some cubes will be harder than others to solve. For hard cubes, it is tempting to develop a recursive |
| 96 | +cubing strategy. Ideally, a recursive cubing strategy is symmetric to top-level cubing. |
| 97 | + |
| 98 | +* The solver would have to identify hard cubes vs. easy cubes. |
| 99 | +* It would have to know when to stop working on a hard cube and replace it in the list of cubes by |
| 100 | + a new list of sub-cubes. |
| 101 | + |
| 102 | +* Ideally, we don't need any static cubing and cubing is grown on demand while all threads are utilized. |
| 103 | + * If we spawn $T$ threads to initially work with empty cubes, we could extract up to $T$ indepenent cubes |
| 104 | + by examining the proof-prefix of their traces. This can form the basis for the first, up to $2^T$ cubes. |
| 105 | + * After a round of solving with each thread churning on some cubes, we may obtain more proof-prefixes from |
| 106 | + _hard_ cubes. It is not obvious that we want to share cubes from different proof prefixes at this point. |
| 107 | + But a starting point is to split a hard cube into two by using the proof-prefix from attempting to solve it. |
| 108 | + * Suppose we take the proof-prefix sampling algorithm at heart: It says to start with some initial cube prefix |
| 109 | + and then sample for other cube literals. If we translate it to the case where multiple cubes are being processed |
| 110 | + in parallel, then an analogy is to share candidates for new cube literals among cubes that are close to each-other. |
| 111 | + For example, if thread $t_1$ processes cube $a, b, c$ and $t_2$ processes $a,b, \neg c$. They are close. They are only |
| 112 | + separated by Hamming distance 1. If $t_1$ finds cube literal $d$ and $t_2$ finds cube literal $e$, we could consider the cubes |
| 113 | + $a, b, c, d, e$, and $a, b, c, d, \neg e$, $\ldots$, $a, b, \neg c, \neg d, \neg e$. |
| 114 | + |
| 115 | +#### Representing cubes implicitly |
| 116 | + |
| 117 | +We can represent a list of cubes by using intervals and only represent start and end-points of the intervals. |
| 118 | + |
| 119 | +#### Batching |
| 120 | +Threads can work on more than one cube in a batch. |
| 121 | + |
| 122 | +### Synchronization |
| 123 | + |
| 124 | +* The first thread to time out or finish could kill other threads instead of joining on all threads to finish. |
| 125 | +* Instead of synchronization barriers have threads continue concurrently without terminating. They synchronize on signals and new units. This is trickier to implement, but in some guises accomplished in [sat/sat_parallel.cpp](https://github.com/Z3Prover/z3/blob/master/src/sat/sat_parallel.cpp) |
| 126 | + |
| 127 | + |
| 128 | +## Parameter tuning |
| 129 | + |
| 130 | +The idea is to have parallel threads try out different parameter settings and search the parameter space of an optimal parameter setting. |
| 131 | + |
| 132 | +Let us assume that there is a set of tunable parameters $P$. The set comprises of a set of named parameters with initial values. |
| 133 | +$P = \{ (p_1, v_1), \ldots, (p_n, v_n) \}$. |
| 134 | +With each parameter associate a set of mutation functions $+=, -=, *=$, such as increment, decrement, scale a parameter by a non-negative multiplier (which can be less than 1). |
| 135 | +We will initialize a search space of parameter settings by parameters, values and mutation functions that have assigned reward values. The reward value is incremented |
| 136 | +if a parameter mutation step results in an improvement, and decremented if a mutation step degrades performance. |
| 137 | +$P = \{ (p_1, v_1, \{ (r_{11}, m_{11}), \ldots, (r_{1k_1}, m_{1k_1}) \}), \ldots, (p_n, v_n, \{ (r_{n1}, m_{n1}), \ldots, (r_{nk_n}, m_{nk_n})\}) \}$. |
| 138 | +The initial values of reward functions is fixed (to 1) and the initial values of parameters are the defaults. |
| 139 | + |
| 140 | +* The batch manager maintains a set of candidate parameters $CP = \{ (P_1, r_1), \ldots, (P_n, r_n) \}$. |
| 141 | +* A worker thread picks up a parameter $P_i$ from $CP$ from the batch manager. |
| 142 | +* It picks one or more parameter settings within $P_i$ whose mutation function have non-zero reward functions and applies a mutation. |
| 143 | +* It then runs with a batch of cubes. |
| 144 | +* It measures the reward for the new parameter setting based in number of cubes, cube depth, number of timeouts, and completions with number of conflicts. |
| 145 | +* If the new reward is an improvement over $(P_i, r_i)$ it inserts the new parameter setting $(P_i', r_i')$ into the batch manager. |
| 146 | +* The batch manager discards the worst parameter settings keeping the top $K$ ($K = 5$) parameter settings. |
| 147 | + |
| 148 | +When picking among mutation steps with reward functions use a weighted sampling algorithm. |
| 149 | +Weighted sampling works as follows: You are given a set of items with weights $(i_1, w_1), \ldots, (i_k, w_k)$. |
| 150 | +Add $w = \sum_j w_j$. Pick a random number $w_0$ in the range $0\ldots w$. |
| 151 | +Then you pick item $i_n$ such that $n$ is the smallest index with $\sum_{j = 1}^n w_j \geq w_0$. |
| 152 | + |
| 153 | +SMT parameters that could be tuned: |
| 154 | + |
| 155 | +<pre> |
| 156 | + |
| 157 | + arith.bprop_on_pivoted_rows (bool) (default: true) |
| 158 | + arith.branch_cut_ratio (unsigned int) (default: 2) |
| 159 | + arith.eager_eq_axioms (bool) (default: true) |
| 160 | + arith.enable_hnf (bool) (default: true) |
| 161 | + arith.greatest_error_pivot (bool) (default: false) |
| 162 | + arith.int_eq_branch (bool) (default: false) |
| 163 | + arith.min (bool) (default: false) |
| 164 | + arith.nl.branching (bool) (default: true) |
| 165 | + arith.nl.cross_nested (bool) (default: true) |
| 166 | + arith.nl.delay (unsigned int) (default: 10) |
| 167 | + arith.nl.expensive_patching (bool) (default: false) |
| 168 | + arith.nl.expp (bool) (default: false) |
| 169 | + arith.nl.gr_q (unsigned int) (default: 10) |
| 170 | + arith.nl.grobner (bool) (default: true) |
| 171 | + arith.nl.grobner_cnfl_to_report (unsigned int) (default: 1) |
| 172 | + arith.nl.grobner_eqs_growth (unsigned int) (default: 10) |
| 173 | + arith.nl.grobner_expr_degree_growth (unsigned int) (default: 2) |
| 174 | + arith.nl.grobner_expr_size_growth (unsigned int) (default: 2) |
| 175 | + arith.nl.grobner_frequency (unsigned int) (default: 4) |
| 176 | + arith.nl.grobner_max_simplified (unsigned int) (default: 10000) |
| 177 | + arith.nl.grobner_row_length_limit (unsigned int) (default: 10) |
| 178 | + arith.nl.grobner_subs_fixed (unsigned int) (default: 1) |
| 179 | + arith.nl.horner (bool) (default: true) |
| 180 | + arith.nl.horner_frequency (unsigned int) (default: 4) |
| 181 | + arith.nl.horner_row_length_limit (unsigned int) (default: 10) |
| 182 | + arith.nl.horner_subs_fixed (unsigned int) (default: 2) |
| 183 | + arith.nl.nra (bool) (default: true) |
| 184 | + arith.nl.optimize_bounds (bool) (default: true) |
| 185 | + arith.nl.order (bool) (default: true) |
| 186 | + arith.nl.propagate_linear_monomials (bool) (default: true) |
| 187 | + arith.nl.rounds (unsigned int) (default: 1024) |
| 188 | + arith.nl.tangents (bool) (default: true) |
| 189 | + arith.propagate_eqs (bool) (default: true) |
| 190 | + arith.propagation_mode (unsigned int) (default: 1) |
| 191 | + arith.random_initial_value (bool) (default: false) |
| 192 | + arith.rep_freq (unsigned int) (default: 0) |
| 193 | + arith.simplex_strategy (unsigned int) (default: 0) |
| 194 | + dack (unsigned int) (default: 1) |
| 195 | + dack.eq (bool) (default: false) |
| 196 | + dack.factor (double) (default: 0.1) |
| 197 | + dack.gc (unsigned int) (default: 2000) |
| 198 | + dack.gc_inv_decay (double) (default: 0.8) |
| 199 | + dack.threshold (unsigned int) (default: 10) |
| 200 | + delay_units (bool) (default: false) |
| 201 | + delay_units_threshold (unsigned int) (default: 32) |
| 202 | + dt_lazy_splits (unsigned int) (default: 1) |
| 203 | + lemma_gc_strategy (unsigned int) (default: 0) |
| 204 | + phase_caching_off (unsigned int) (default: 100) |
| 205 | + phase_caching_on (unsigned int) (default: 400) |
| 206 | + phase_selection (unsigned int) (default: 3) |
| 207 | + qi.eager_threshold (double) (default: 10.0) |
| 208 | + qi.lazy_threshold (double) (default: 20.0) |
| 209 | + qi.quick_checker (unsigned int) (default: 0) |
| 210 | + relevancy (unsigned int) (default: 2) |
| 211 | + restart_factor (double) (default: 1.1) |
| 212 | + restart_strategy (unsigned int) (default: 1) |
| 213 | + seq.max_unfolding (unsigned int) (default: 1000000000) |
| 214 | + seq.min_unfolding (unsigned int) (default: 1) |
| 215 | + seq.split_w_len (bool) (default: true) |
| 216 | +</pre> |
| 217 | + |
| 218 | + |
0 commit comments