-
Notifications
You must be signed in to change notification settings - Fork 76
Faster vartime division for Uint
#608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster vartime division for Uint
#608
Conversation
Signed-off-by: Andrew Whitehead <[email protected]>
|
It looks like |
Signed-off-by: Andrew Whitehead <[email protected]>
src/uint/div.rs
Outdated
| // If the subtraction borrowed, then decrement q and add back the divisor | ||
| let ct_borrow = ConstChoice::from_word_mask(borrow.0); | ||
| carry = Limb::ZERO; | ||
| i = 0; | ||
| while i < yc { | ||
| (x[xi + i + 1 - yc], carry) = | ||
| x[xi + i + 1 - yc].adc(Limb::select(Limb::ZERO, y[i], ct_borrow), carry); | ||
| i += 1; | ||
| } | ||
| quo -= ct_borrow.select_word(0, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't actually get this to trigger in practice (except by incrementing quo earlier). That's probably because "the probability that the adding back in step L5 must be executed is of order 2/b" ie. 1/2**31 or 1/2**63 in this case.
Signed-off-by: Andrew Whitehead <[email protected]>
That's a good point... perhaps there should be both |
|
This is definitely much faster than what I have there, and as for the naming stuff, I'll make a separate PR. |
| } | ||
|
|
||
| rem | ||
| self.div_rem_vartime(rhs).1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if manually modifying div_rem_vartime() with the assumption that the quotient is discarded produces a faster result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there would be much benefit, it would just avoid storing the quotient words and then shifting them at the end. I think I also tried an inlined version to see if the optimizer would catch anything and it didn't seem to have any impact.
| /// | ||
| /// When used with a fixed `rhs`, this function is constant-time with respect | ||
| /// to `self`. | ||
| pub const fn rem_wide_vartime(lower_upper: (Self, Self), rhs: &NonZero<Self>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be reduced to div_rem_vartime() by widening rhs to double width, and then halving the result? So that you don't have to spell out the whole algorithm again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that can be done without extra trait bounds or const evaluation support.
|
This splits |
Signed-off-by: Andrew Whitehead <[email protected]>
|
I also have vartime division for BoxedUint implemented, which I was planning to add in a separate PR. |
Uint
|
Just a note: I experimented a little with trying to piggyback pub const fn rem_wide_vartime<const R: usize>(lower_upper: (Self, Self), rhs: &NonZero<Self>) -> Self
where
Self: Concat<Output=Uint<R>>,
Uint<R>: Split<Output=Self>,
{
let (lo, hi) = lower_upper;
let wide_self: <Self as Concat>::Output = lo.concat(&hi);
let (_q, r) = wide_self.div_rem_vartime(rhs);
r
}The problem is, it then requires an |
Implements faster vartime division (vartime with the divisor only) for Uint based on Knuth's TAOCP volume 2, as outlined at https://janmr.com/blog/2014/04/basic-multiple-precision-long-division/
This does not address vartime division for BoxedUint or the other TODOs in #511
Relevant benchmarks: