Commit 8fe22cb
authored
Inline low level multiplication and reduction functions (#776)
Fixes a performance regression introduced in #667. Evidently, compiler
relies a lot on knowing the slice sizes at compile time, so I'm inlining
`schoolbook_multiplication()`, `schoolbook_squaring()`, and
`montgomery_reduction_inner()`, so the compiler can optimize in case of
`Uint`s.
Benchmarks:
- `wrapping ops/split_mul, U256xU256` - 26ns to 9ns
- `Const Montgomery arithmetic/multiplication, U256*U256` - 41ns to 21ns
- `Dynamic Montgomery arithmetic/multiplication, U256*U256` - 62ns to
44ns
The effect is less pronounced for longer integers, but sill amounts to
5-10% speedup for U4096.
On a higher level, this affects many `crypto-primes` benchmarks, e.g.
doubles the speed of Lucas test for U128.
Possible addition: I think `panic!` in these functions can be replaced
with `debug_assert!`, but I don't insist on it.1 parent d668d41 commit 8fe22cb
2 files changed
Lines changed: 3 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| 56 | + | |
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
| |||
0 commit comments