Acceleration of SpMVM in folding

### Background:
Both the Nova project and our for Arecibo consider performance of folding to be critical. During our recent analysis, we noticed that the commitment to cross-terms is a significant component of the folding performance. This observation holds true even when the costs are, for large enough ops, primarily dominated by an MSM operation, as predicted by theoretical analysis.

### Findings:

The texray graph showcasing our investigation's results is as follows:
```
RecursiveSNARK::prove_step                       5s     ├───────────────────────────────────────────────────────────────┤
    <MultiFrame as StepCircuit>::synthesize      1s     ├────────────┤
    <_ as Group>::vartime_multiscalar_mul      947ms                  ├─────────┤
    NIFS::prove                                  3s                              ├─────────────────────────────────────┤
        AZ_1, BZ_1, CZ_1                         1s                              ├─────────────┤
        AZ_2, BZ_2, CZ_2                       767ms                                             ├───────┤
        cross terms                            263ms                                                      ├─┤
        T                                      202ms                                                         ├┤
        <_ as Group>::vartime_multiscalar_mul  674ms                                                           ├──────┤
    <_ as Group>::vartime_multiscalar_mul        5ms                                                                    ┆
```
@winston-h-zhang to provide more information on reproduction here

- The commitment to cross-terms is evident at the following location in the codebase:
[arecibo/src/nifs.rs#L52C1-L53](https://github.com/lurk-lab/arecibo/blob/83982b3babe714d9d1370cbdea1c91945062c31c/src/nifs.rs#L52C1-L53)

- The primary computational complexity in this function arises when computing a matrix-vector product between the R1CS matrices and an incoming instance-witness pair. This is visible at:
[arecibo/src/r1cs/mod.rs#L285-L301](https://github.com/lurk-lab/arecibo/blob/81320d71d050e8d1f4e4580cef269596b8623376/src/r1cs/mod.rs#L285-L301)

- The matrix-vector product terms are represented as AZ_1, BZ_1, CZ_1 and AZ_2, BZ_2, CZ_2 in the texray graph.

- To optimize the operation's speed, [we transitioned the matrix representation from COO to CSR](https://github.com/lurk-lab/arecibo/pull/38) given that they are represented in sparse form.

### Challenges:

While the MSM operation can be GPU-accelerated (as seen in the [pasta-msm project](https://github.com/supranational/pasta-msm/tree/main)), the field multiplications involved in the matrix-vector product are currently not.

### Proposed Solution:

It's imperative to accelerate these field multiplications to achieve optimal performance for the folding operation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acceleration of SpMVM in folding #75

Background:

Findings:

Challenges:

Proposed Solution:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Acceleration of SpMVM in folding #75

Description

Background:

Findings:

Challenges:

Proposed Solution:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions