fix: Make nargo::ops::transform_program idempotent#6695
fix: Make nargo::ops::transform_program idempotent#6695aakoshh merged 13 commits into6670-test-transform-is-idempotentfrom
nargo::ops::transform_program idempotent#6695Conversation
Peak Memory Sample
|
4b370d6 to
b59cb34
Compare
TomAFrench
left a comment
There was a problem hiding this comment.
So we need two extra passes to stabilise the optimisation currently. An example were 1 extra pass is not enough was conditional_1.
Performing the full transformation step again is a little overkill. The only non-idempotent optimization is #6668 afaik so if anything we should run multiple instances of this but avoid the rest.
Agreed, I just tried that as a quick way to see if there is an upper bound now. |
I pushed a commit which allows multiple passes with the |
Changes to circuit sizes
🧾 Summary (10% most significant diffs)
Full diff report 👇
|
|
For example on the following example it looks like the MEO only runs once, which suggests there is another source of unstable output: cargo test -p nargo_cli --bins -- test_transform_program_is_idempotent to_be_bytes |
371678e to
ce9d107
Compare
051be60 to
ce9d107
Compare
Description
Problem*
Followup for #6694
Summary*
Makes
nargo::ops::transform_programidempotent by adding two loops, both up to 3 passes, exiting early if the opcode hash doesn't change:acvm::compiler::compilethat includesoptimize_internalandtransform_internalacvm::compiler::transformers::transform_internalAdditional Context
Here's the journey I went through investigating the source of changes across optimisation runs.
Ever increasing
current_witness_indexOn the
slice_loopexample theCircuit::current_witness_indexfield increased by 2 after each pass. This seems to be because:The PR adds a brute force step to visit each opcode and collect the remaining witnesses to set the next one correctly.
Alternatively we could implement
PartialEqforCircuitin a way that it ignorescurrent_witness_index.After this fix we have the following tests still flagging the transformation as non-idempotent:
7_functionconditional_1fold_fibonaccihashmapregression_5252regression_6451sha256sha256_regressionsha256_var_size_regressionsha256_var_witness_const_regressionslicesto_be_bytesMultiple passes
For the above I found that adding the following to the test makes all of them pass:
So two extra full passes to stabilise the optimisation currently. An example where 1 extra pass is not enough was
conditional_1. Ideally we would find which part of the transformation needs to be repeated to avoid having to do a full pass; our initial hypothesis was to look at #6668I had to add a loop around
transform_internalallowing 3 passes before the tests indicated more stability than before. The expectation was that we only need to loop aroundMergeExpressionOptimizer, but that doesn't seem to be true. Even after this, the following two programs fail:7_functionslicesThe difference in both cases is the removal of a range check:
For these to stabilise had to move the loop to be around the entire transformation that includes the initial backend agnostic
optimize_internalas well:The final solution has an inner and an outer loop, treating both as black boxes.
Documentation*
Check one:
PR Checklist*
cargo fmton default settings.