feat: add a digest to R1CSShape by huitseeker · Pull Request #49 · lurk-lang/arecibo

huitseeker · 2023-09-16T18:46:21Z

What this does

refactors the DigestBuilder introduced in Add DigestBuilder. #40 to something that sets aside the builder pattern, but is hopefully simpler: having an unambiguous transient DigestComputer that is effectively taking the (configurable) Digestible bytes, and computing a digest from it
stores and caches this digest in the structure in a way that is then not polluting the Digestible bytes, even for a struct that implements SimpleDigest. The cache also 1/ can't be populated twice, 2/ can't be befuddled by structs deserialized from incorrect digest information.
this applies this pattern to add a (cached) digest to R1CSShape, which is orthogonal to the digest of public parameters.

What this does not do

embed the digest of R1CSShapes constituent of PublicParameters inside their Digestible bytes, since as discussed in review this would require proper domain separation to be secure,

At the cost of needing to re-hash the R1CSShapes when computing the digest of PublicParams, this solves #31 as stated. It can indeed, when looking at full-fledged PPs, allow serializing them in a file name built from the digest of their member R1CSShapes, and allows a fail-fast when looking up PPs in that cache and no file with the sought R1CSShape digest can be found.

I'm not opposed to also imbricating digests recursively, if this is done with proper domain separation (probably using something like serde_name as a domain separator for the constituent digests).

winston-h-zhang · 2023-09-17T02:34:25Z

@huitseeker I think this is essentially what we all agreed on in #31, but I'm now realizing there are some performance issues.

Serializing compressed points. There is the issue that serializing compressed points with bincode is still somewhat slow. From my recent experience running tests in Intergrate SuperNova's PublicParams, benchmark NIVC, and Stablize Param API lurk-lab/lurk-beta#648, generating the digest of the R1SCShape takes around 1/3 of the time of fully generate params. This makes sense, since the two largest portions of the params are the shape and the commitment key, which are both like O(num_cons), so I expect this 1:3 ratio to be unchanged at higher rc. This is a problem, because even at rc=300 params take ~300 secs to generate, which means were looking at ~100 secs to generate shape digest as a cache key and lookup its corresponding params. This would be a huge regression on the current timings.
Shape memory consumption. Extending the 1:3 shape/param analysis, as supernova matures, shape grows as O(total_cons) while the commitment key only grows as O(max(num_cons_i)), where total_cons = \sum_i num_cons_i. So in a world where supernova and coprocessors are mature in lurk, we can expect shape digest generation to dominate the computation -- then there would be no point in generating the shape digest as a cache key.

Given this, I'd like to hear everyone's thoughts. cc: @porcuquine

Mine are the following: I think digests should not be used as cache keys. Thus, we need to find another method of generating a good cache key. This other method is orthogonal to the current PR. (If it's the case that adding shape digests is only useful for generating a cache key and nowhere in (super)nova, then this PR would lose its use case -- but that is again an orthogonal dicussion.)

I think a good candidate is to generate a cache key from abomonated bytes. This is conceptually totally distinct from a digest. First, this solves the performance issues. Second, you may be worried about the lack of compatibility of abomonated. This is not an issue because the cache key and cached param file should only be on your local machine, so there is never any conflict. Third, because this is not a "digest," we wouldn't be keeping the value in some abomonated_digest field in R1CSShape, it would only be a temporary value computed when we want to check the cache. So the code would be very similar to the original compute_digest function, except called compute_abomonated_key -- just a one-time function without any builder/traits involved.

porcuquine · 2023-09-17T02:43:49Z

Why is the shape digest dealing with compressed points? It should just be scalars, I think; and I would not expect that it should be slow. I'm open to not using the 'official digest': that's not important. But it's worth understanding why using a performant hash on a relatively small amount of simple data is not fast. It sounds like something else is going on.

porcuquine · 2023-09-17T02:50:32Z

I think a good candidate is to generate a cache key from abomonated bytes. This is conceptually totally distinct from a digest.

I don't think that, for purposes, of cache keys it matters exactly how we serialize and hash the shape. That said, I think any approach that involves 'serializing then hashing' is likely going to perform worse than one that knows the structure of the data and how to hash it without allocating (maybe 'abomonated bytes' have this property).

In other words, for purposes of cache keys, can't we just iterate over the fields and hash the data? I can't imagine that would take anything like 100 seconds.

winston-h-zhang · 2023-09-17T02:57:03Z

Why is the shape digest dealing with compressed points? It should just be scalars, I think

Ah shoot, you're right. My analysis is wrong then, but I believe the numbers are correct. So there is something we're missing.

- Removed `to_bytes` method from the `Digestible` trait in `src/digest.rs` file.

- Updated `bincode::serialize_into(byte_sink, self)` with a configurable version to enable "little endian" and "fixint encoding" options. - Added a comment in `src/digest.rs` about `bincode`'s recursive length-prefixing during serialization.

…puter` This gives up on a generic builder and instead uses an idempotent `OnceCell` + a generic digest computer to populate the digest of a structure. - this shows how to set up digest computation so it doesn't depend on the digest field, - the digest can't be set twice, - an erroneous digest can't be inferred from the serialized data. In Details: - Overhauled digest functionality in multiple files by replacing `DigestBuilder` with `DigestComputer`, significantly altering the handling of hashes. - Incorporated `once_cell::sync::OnceCell` and `ff::PrimeField` dependencies to improve performance and simplify code. - Modified `VerifierKey` and `RunningClaims` structures to include a `OnceCell` for digest, leading to a change in function calls and procedures. - Simplified `setup_running_claims` by removing error handling and directly returning `RunningClaims` type. - Adapted test functions according to the changes including the removal of unnecessary unwrapping in certain scenarios. - Updated Cargo.toml with the new dependency `once_cell` version `1.18.0`.

- Introduced a new assertion within the `write_bytes` method of `src/supernova/mod.rs` for validating whether the `claims` are empty - Improved code comment clarity regarding the creation of a running claim in `src/supernova/mod.rs`.

winston-h-zhang · 2023-09-17T18:56:22Z

Here are the timings on my machine for the current method:

rc	Circuit digest	Public params
2	530ms	1.943s
10	1.500s	4.279s
50	6.702s	18.349s
100	14.004s	37.515s
200	28.667s	78.322s
500	83.752s	284.734s

huitseeker · 2023-09-17T19:02:59Z

@winston-h-zhang we already have a way to attach a cached, non-intrusive digest to some structs, including R1CSShape and PublicParameters. The function that passes them through a hash function today is their bincode serialization, but the design of Digestible makes it easy to customize in an ad-hoc fashion. So I guess what @porcuquine is asking is: is there a function (that is, an implementation of Digestible::write_bytes) that would stream binding bytes faster?

Note: in your benchmarks, you may want to measure the # of bytes in the input, to compare against the number of bytes per second of sha3 itself (and perhaps change this function if its throughput is unsuitable).

porcuquine · 2023-09-17T22:31:07Z

Right. Also, if we are not worrying about compatibility with the security-critical digest, we can ask:

Does it need to be SHA3?
Are we using a fast implementation thereof?

If the hashing is the bottleneck, let's change it (for purposes of the cache-key).

winston-h-zhang · 2023-09-18T09:47:49Z

Ok, after some more testing, I think my above conclusions where misleading. Sorry about that.

It's the synthesizing, rather than the hashing, that's slowing everything down.

pp start!                                     32s  ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  nova::circuit_digest                        11s  ├───────────────────────────────────────────────────────┤
    <MultiFrame as StepCircuit>::synthesize    9s   ├────────────────────────────────────────────┤
  <MultiFrame as StepCircuit>::synthesize      9s                                                              ├────────────────────────────────────────────┤
Public parameters took 32.898782583s

At rc=100, it takes 9 seconds to synthesize the circuit and 2 seconds to compute the hash. At rc=1, it takes around 0.1 seconds to synthesize the circuit and 0.05 seconds to compute the hash. Since Lurk is opaque from Nova, we are forced to pay this 100x cost. However, conceptually we could represent rc=100 with just (lurk_circuit_shape, 100) and save 100x the time in synthesizing. Given this, we could pursue a more complicated solution where lurk computes a circuit hash and passes it on to Nova to have more fine control over the performance.

Doing this we at least guarantee that computing this cache key is constant time with respect to rc and that concretely the timing stays under 5 sec to load cached params at high rc. It's also more scalable as supernova comes and we start adding recursive corpocessors with their own rc, etc. What do you guys think?

cc: @porcuquine @huitseeker

winston-h-zhang · 2023-09-18T20:06:21Z

See the discussion here: https://zulip.lurk-lab.com/#narrow/stream/47-systems/topic/Public.20Parameters

winston-h-zhang · 2023-09-18T20:38:46Z

The proposed solution is implemented here: https://github.com/lurk-lab/lurk-rs/pull/648/files#diff-0986d62a3463fdaaf4862562147e7e3b2d3266a666721787357e822091faf5fbR194-R195

I think we can merge this, then 👀

winston-h-zhang

Sorry for dragging this out. I think we've found a good way to move forward here, so hooray!

* Add DigestBuilder. * Make digest and claims private. * refactor: Refactor DigestBuilder - Refactored `src/digest.rs` to replace `Vec<u8>` storage with dedicated Write I/O. - Removed optional `hasher` and introduced dedicated factory method. - Reworked digest computation and mapping into separate functions. - Merged build and digest computation to enhance coherence. - Improved type safety with Result error propagation. * Propagate DigestBuilder changes. * Fix tests. * Correct assertion for OutputSize scale. * Remove commented. * Remove dbg!. * Fixup rebase. --------- Co-authored-by: porcuquine <porcuquine@users.noreply.github.com> Co-authored-by: François Garillot <francois@garillot.net> feat: add a digest to R1CSShape (#49) * refactor: Refactor Digestible trait - Removed `to_bytes` method from the `Digestible` trait in `src/digest.rs` file. * fix: Make bincode serialization in digest.rs more rigorous - Updated `bincode::serialize_into(byte_sink, self)` with a configurable version to enable "little endian" and "fixint encoding" options. - Added a comment in `src/digest.rs` about `bincode`'s recursive length-prefixing during serialization. * refactor: Refactor digest computation using `OnceCell` and `DigestComputer` This gives up on a generic builder and instead uses an idempotent `OnceCell` + a generic digest computer to populate the digest of a structure. - this shows how to set up digest computation so it doesn't depend on the digest field, - the digest can't be set twice, - an erroneous digest can't be inferred from the serialized data. In Details: - Overhauled digest functionality in multiple files by replacing `DigestBuilder` with `DigestComputer`, significantly altering the handling of hashes. - Incorporated `once_cell::sync::OnceCell` and `ff::PrimeField` dependencies to improve performance and simplify code. - Modified `VerifierKey` and `RunningClaims` structures to include a `OnceCell` for digest, leading to a change in function calls and procedures. - Simplified `setup_running_claims` by removing error handling and directly returning `RunningClaims` type. - Adapted test functions according to the changes including the removal of unnecessary unwrapping in certain scenarios. - Updated Cargo.toml with the new dependency `once_cell` version `1.18.0`. * refactor: rename pp digest in VerifierKey to pp_digest * feat: add a digest to R1CSShape * fix: Small issues - Introduced a new assertion within the `write_bytes` method of `src/supernova/mod.rs` for validating whether the `claims` are empty - Improved code comment clarity regarding the creation of a running claim in `src/supernova/mod.rs`. Co-authored-by: porcuquine <1746729+porcuquine@users.noreply.github.com>

huitseeker requested review from porcuquine and winston-h-zhang September 16, 2023 18:46

huitseeker force-pushed the r1cs_shape_digest branch from 97f3a5a to 9c1ee49 Compare September 16, 2023 18:47

huitseeker force-pushed the r1cs_shape_digest branch from 9c1ee49 to 039fb94 Compare September 17, 2023 13:28

huitseeker added 5 commits September 17, 2023 09:30

refactor: Refactor Digestible trait

1e22232

- Removed `to_bytes` method from the `Digestible` trait in `src/digest.rs` file.

refactor: rename pp digest in VerifierKey to pp_digest

d03e127

feat: add a digest to R1CSShape

6da420d

huitseeker force-pushed the r1cs_shape_digest branch from 039fb94 to aaeeadc Compare September 17, 2023 13:31

fix: Small issues

575a1c3

- Introduced a new assertion within the `write_bytes` method of `src/supernova/mod.rs` for validating whether the `claims` are empty - Improved code comment clarity regarding the creation of a running claim in `src/supernova/mod.rs`.

huitseeker force-pushed the r1cs_shape_digest branch from aaeeadc to 575a1c3 Compare September 17, 2023 13:37

winston-h-zhang approved these changes Sep 18, 2023

View reviewed changes

winston-h-zhang added this pull request to the merge queue Sep 18, 2023

Merged via the queue into dev with commit 4de7294 Sep 18, 2023

winston-h-zhang deleted the r1cs_shape_digest branch September 18, 2023 21:10

winston-h-zhang mentioned this pull request Sep 18, 2023

Add a digest field to R1CSShape #34

Closed

This was referenced Sep 20, 2023

Add a digest to circuits #31

Closed

Refactor digest computations (Arecibo backport) microsoft/Nova#229

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add a digest to R1CSShape#49

feat: add a digest to R1CSShape#49
winston-h-zhang merged 6 commits intodevfrom
r1cs_shape_digest

huitseeker commented Sep 16, 2023 •

edited

Loading

Uh oh!

winston-h-zhang commented Sep 17, 2023 •

edited

Loading

Uh oh!

porcuquine commented Sep 17, 2023 •

edited

Loading

Uh oh!

porcuquine commented Sep 17, 2023

Uh oh!

winston-h-zhang commented Sep 17, 2023

Uh oh!

winston-h-zhang commented Sep 17, 2023 •

edited

Loading

Uh oh!

huitseeker commented Sep 17, 2023 •

edited

Loading

Uh oh!

porcuquine commented Sep 17, 2023

Uh oh!

winston-h-zhang commented Sep 18, 2023

Uh oh!

winston-h-zhang commented Sep 18, 2023

Uh oh!

winston-h-zhang commented Sep 18, 2023

Uh oh!

winston-h-zhang left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

huitseeker commented Sep 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

What this does not do

Uh oh!

winston-h-zhang commented Sep 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

porcuquine commented Sep 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

porcuquine commented Sep 17, 2023

Uh oh!

winston-h-zhang commented Sep 17, 2023

Uh oh!

winston-h-zhang commented Sep 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huitseeker commented Sep 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

porcuquine commented Sep 17, 2023

Uh oh!

winston-h-zhang commented Sep 18, 2023

Uh oh!

winston-h-zhang commented Sep 18, 2023

Uh oh!

winston-h-zhang commented Sep 18, 2023

Uh oh!

winston-h-zhang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

huitseeker commented Sep 16, 2023 •

edited

Loading

winston-h-zhang commented Sep 17, 2023 •

edited

Loading

porcuquine commented Sep 17, 2023 •

edited

Loading

winston-h-zhang commented Sep 17, 2023 •

edited

Loading

huitseeker commented Sep 17, 2023 •

edited

Loading