CSR Representation for Sparse Matrices#38
Conversation
369c4f8 to
f53bc1f
Compare
huitseeker
left a comment
There was a problem hiding this comment.
I think the core step here would be to not involve a COO -> CSR conversion but convert straight to CSR.
|
Note: there are a few places where I don't do the most optimal calculation; I've neglected them in favor of getting this draft PR out. However, everything on the "critical path" is as one would theoretically expect. Maybe there are more concrete optimizations to be made. I also rebased on #34, to save the hassle of public params changing between feature switching due to the difference representations. |
Given the motivation for this change, and that it will change digests, etc., I would love to see a stronger indication that the current implementation will produce the expected performance improvement. I fully accept that we have solid theoretical ground for expecting it, but it would be worth working out enough of the implementation to be sure we won't need more digest-affecting changes in order to realize the gains. |
|
Here are some concrete gains in lurk-rs for proving times, bench sha256_ivc_prove/2023-09-05:7b1b685d5d39892eae74080ea179f2e1654a6f5e:rc=10:sha256_ivc_1/1
time: [9.3756 s 9.4571 s 9.5291 s]
sha256_ivc_prove/2023-09-05:7b1b685d5d39892eae74080ea179f2e1654a6f5e:rc=100:sha256_ivc_1/1
time: [6.2255 s 6.3031 s 6.3943 s]CSR: sha256_ivc_prove/2023-09-05:7b1b685d5d39892eae74080ea179f2e1654a6f5e:rc=10:sha256_ivc_1/1
time: [8.4676 s 8.6431 s 8.7572 s]
change: [-10.421% -8.6069% -7.0613%] (p = 0.00 < 0.05)
Performance has improved.
sha256_ivc_prove/2023-09-05:7b1b685d5d39892eae74080ea179f2e1654a6f5e:rc=100:sha256_ivc_1/1
time: [5.3944 s 5.5193 s 5.6346 s]
change: [-14.883% -12.435% -10.207%] (p = 0.00 < 0.05)
Performance has improved. |
Ah, perfect — thanks. I think that's sufficient to show this is manifestly worthwhile. If we can gain confidence in the concrete representation being stable, I'd suggest we don't need a feature flag to accept this. |
huitseeker
left a comment
There was a problem hiding this comment.
Agreed with @porcuquine I think this is ripe for a CSR-only implementation, provided this uses CSR from the start.
The conversion is indeed just part of setup, but when we're dealing with foldings that run in <40ms, I expect the additional handful of ms involved in this unnecessary conversion to show a difference.
It doesn't matter how long one folding takes. If we are folding many times, then the one-time cost of processing circuit shapes should not be a big deal. In other words, at the scale of proofs that will take seconds or minutes to generate, then ms are not that big a deal. However, I agree that we should just materialize whatever we are going to use, as a matter of efficiency in principal. The reason I mention is that the only case (as I see it), where the one-time cost of creating the shape is significant, is the case where the whole proof is very very small. I'm trying to emphasize that this is not a very important case in terms of how we focus on optimization. So even though I agree it's worth this savings, the much more important point is to ensure that whatever we do here is indeed only being performed once, when creating parameters for a given circuit. |
I agree, but that is very precisely the case chosen for the initial benchmarks @winston-h-zhang reported, which created (and arguably still creates) some noise in communication. The work of creating matrices in any format is already performed once, and nothing about this PR changes that. That one-time work is however creating an intermediary representation in this PR that does not need to be there. |
|
The logical implication of what I'm saying is that if we want to justify the work based on measured performance rather than theory, then we should choose a representative benchmark (like the ones from |
huitseeker
left a comment
There was a problem hiding this comment.
This work seems completely orthogonal to adding a digest to the R1CS shapes. In order to merge this quickly, I would recommend rebasing this PR so that it only contains the commits relevant to the CSR representation. There are two equivalent ways of doing this:
git rebase --onto(ref)git rebase --interactive origin/master(and then remove all the irrelevant commits)
|
Yes, sorry the digest work was just for testing on lurk-rs. Let me rebase |
85c2350 to
9cefd6f
Compare
a41142f to
c90d9c4
Compare
|
!benchmark |
huitseeker
left a comment
There was a problem hiding this comment.
This needs a rebase, so back to your queue.
I have a rebased branch, which I built for review, if that helps https://github.com/huitseeker/Nova/tree/csr-mvm-feature-review
Further, the iterator is OK when you really need to iterate over every element in a non-amiguous way1, e.g. when computing a hash. But it shouldn't be a crutch moving us away from using the native CSR shape for all arithmetic operations, which I think is possible here, in snark.rs and everything I've seen so far in ppsnark.rs.
Footnotes
-
i.e. where interpreting the order of arrival of the data must not require having context about the matrix representation ↩
c8005fe to
44887b3
Compare
64b0f69 to
503d86f
Compare
5a67864 to
7ca069f
Compare
|
!benchmark |
|
!benchmark |
Benchmark for 9d5a73bClick to view benchmark
|
Benchmark for 9d5a73bClick to view benchmark
|
Benchmark for 9d5a73bClick to view benchmark
|
Hmm, my benchmarks show a very consistent improvement in the Recursive benches: |
There was a problem hiding this comment.
I think this is close to done. The only blocking items are:
the initialization of the inner vector in,fn new- adjusting
test_matrix_vector_multiplicationandtest_matrix_iterin any of the ways I suggested, or equivalent.
I've left a range of suggestions for those tests, but I'll be happy with a minimal fix.
b8627bf to
5252452
Compare
5252452 to
24b0f3d
Compare
huitseeker
left a comment
There was a problem hiding this comment.
Tip top! thank you so much!
* CSR Representation for Sparse Matrices (#38) * add csr feature * remove feature remove conversions * cleanup * avoid iter() and go row-by-row * cleanup and apply suggestions * add R1CSShape padding test * add more suggestions * refactor: Refactor sparse namespace and SparseMatrix visibility - Modifies the visibility of `sparse` module from public to private - Makes `SparseMatrix` class within `sparse` module public - Updates `SparseMatrix` import across several files to accommodate visibility changes * fix: define matrix inputs in idiomatic Nova style * chore: Refactor dependencies for r1cs module - Shifted `proptest` and `rand` dependencies to dev-dependencies in `Cargo.toml` - Introduced testing flag for `util` module under `r1cs` --------- Co-authored-by: Hanting Zhang <[email protected]>
Resolves #37.
This PR implements CSR matrix-vector multiplication under the
"csr"feature. (This was for me to debug; if CSR sould become the only representation, then we can simply replace and remove the feature.) We get a slight uptick in performance, around 10%.Notes:
There are many sparse matrix crates available in rust. However, most of these crates are do not parallelize, or are not compatiable with field elements. For example, sprs does not parallelize its sparse matrix / dense vector multiplication. It's traits also require
num_traits::Zero, which is not compatiable withff::PrimeField.CSR:
No CSR: