Skip to content

Conversation

@pitdicker
Copy link

I should just have done this with the previous PR.

The function to calculate a widening multiply for u64 using 32-bit multiples is now a macro, so it can also be used for u128 with 64-bit multiplies. It is not yet optimal on 32-bit architectures, but much better than what we had.

Benchmarks before:

test distr_range_i128         ... bench:     141,265 ns/iter (+/- 3,125) = 113 MB/s (x86_64)
test distr_range_i128         ... bench:     399,455 ns/iter (+/- 6,462) = 40 MB/s (x86)

After:

test distr_range_i128         ... bench:       9,076 ns/iter (+/- 103) = 1762 MB/s (x86_64)
test distr_range_i128         ... bench:      55,194 ns/iter (+/- 472) = 289 MB/s (x86)

@dhardy
Copy link
Owner

dhardy commented Dec 31, 2017

Nice work.

@dhardy dhardy merged commit 1f9ce3a into dhardy:master Dec 31, 2017
@pitdicker pitdicker deleted the range_128 branch December 31, 2017 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants