-
Notifications
You must be signed in to change notification settings - Fork 37
Acceleration of SpMVM in folding #75
Description
Background:
Both the Nova project and our for Arecibo consider performance of folding to be critical. During our recent analysis, we noticed that the commitment to cross-terms is a significant component of the folding performance. This observation holds true even when the costs are, for large enough ops, primarily dominated by an MSM operation, as predicted by theoretical analysis.
Findings:
The texray graph showcasing our investigation's results is as follows:
RecursiveSNARK::prove_step 5s ├───────────────────────────────────────────────────────────────┤
<MultiFrame as StepCircuit>::synthesize 1s ├────────────┤
<_ as Group>::vartime_multiscalar_mul 947ms ├─────────┤
NIFS::prove 3s ├─────────────────────────────────────┤
AZ_1, BZ_1, CZ_1 1s ├─────────────┤
AZ_2, BZ_2, CZ_2 767ms ├───────┤
cross terms 263ms ├─┤
T 202ms ├┤
<_ as Group>::vartime_multiscalar_mul 674ms ├──────┤
<_ as Group>::vartime_multiscalar_mul 5ms ┆
@winston-h-zhang to provide more information on reproduction here
-
The commitment to cross-terms is evident at the following location in the codebase:
arecibo/src/nifs.rs#L52C1-L53 -
The primary computational complexity in this function arises when computing a matrix-vector product between the R1CS matrices and an incoming instance-witness pair. This is visible at:
arecibo/src/r1cs/mod.rs#L285-L301 -
The matrix-vector product terms are represented as AZ_1, BZ_1, CZ_1 and AZ_2, BZ_2, CZ_2 in the texray graph.
-
To optimize the operation's speed, we transitioned the matrix representation from COO to CSR given that they are represented in sparse form.
Challenges:
While the MSM operation can be GPU-accelerated (as seen in the pasta-msm project), the field multiplications involved in the matrix-vector product are currently not.
Proposed Solution:
It's imperative to accelerate these field multiplications to achieve optimal performance for the folding operation.