Skip to content

Conversation

@andrewwhitehead
Copy link
Contributor

@andrewwhitehead andrewwhitehead commented Aug 14, 2024

Another PR to review, sorry...

This PR adds Karatsuba multiplication and squaring algorithms for both Uint and BoxedUint. The BoxedUint implementation is fairly flexible, and multiplication is supported for mixed operand sizes. The constants used may need tweaking, especially for 32 bit platforms to ensure the best implementation is chosen. On my 64-bit platform the numbers are currently:

BoxedUint mul:
2048 bits: -6% (940ns)
3840 bits: -25% (2.7µs)
4096 bits: -3% (2.0µs)
8192 bits: -42% (9.6µs)

(There is not a linear relationship in the improvements, although the relative timings did become more linear with the integer size)

BoxedUint square:
4096 bits: -3% (2.0µs)
7680 bits: -19% (5.9µs)
8192 bits: -24% (6.5µs)

The Uint implementation is more limited due to the strict typing, and is currently only implemented for split_mul with 16, 32, 64, or 128 limb arguments (U1024, U2048, U4096, U8192 on 64-bit) as well as square_wide with 64 or 128 limb arguments.

Uint split_mul:
U1024 x U1024: -11% (182ns)
U2048 x U2048: -37% (610ns)
U4096 x U4096: -47% (2.2µs)
U8192 x U8192: -53% (7.5µs)

Uint square_wide:
U4096: -5% (2.0µs)
U8192: -16% (6.7µs)

@andrewwhitehead
Copy link
Contributor Author

@dignifiedquire I probably could have copied some of the more extensive documentation from your PR, maybe there's something there you'd like to add.

@@ -0,0 +1,414 @@
//! Karatsuba multiplication
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit but I'd probably suggest putting this under mul/karatsuba.rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/uint/mul or src/mul?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/uint/mul

Signed-off-by: Andrew Whitehead <[email protected]>
let size = self.nlimbs() + rhs.nlimbs();
let overlap = self.nlimbs().min(rhs.nlimbs());

if self.nlimbs().min(rhs.nlimbs()) >= 32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does 32 come from? Perhaps it could use a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from benchmarking on this machine, only applying the reduction where it's likely to be faster.

let mut limbs = vec![Limb::ZERO; self.nlimbs() * 2];
let size = self.nlimbs() * 2;

if self.nlimbs() >= 64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here, this could probably also use a constant (or short expression like KARATSUBA_MIN_LIMBS * 2 or thereabouts)

Signed-off-by: Andrew Whitehead <[email protected]>
@tarcieri tarcieri requested a review from fjarri August 16, 2024 20:42
@tarcieri tarcieri merged commit de72555 into RustCrypto:master Aug 16, 2024
@andrewwhitehead andrewwhitehead deleted the feat/karatsuba branch August 17, 2024 00:27
@tarcieri tarcieri mentioned this pull request Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants