Conversation
|
Sure, U256::add time: [1.6429 ns 1.6462 ns 1.6499 ns] U256::div (lo/lo) time: [8.1225 ns 8.1331 ns 8.1452 ns] U256::div (hi/lo) time: [19.067 ns 19.083 ns 19.100 ns] U256::div (hi/hi) time: [171.46 ns 172.14 ns 172.87 ns] U256::mul time: [2.6050 ns 2.6064 ns 2.6078 ns] U256::sub time: [1.6185 ns 1.6207 ns 1.6232 ns] U256::shl time: [1.9257 ns 1.9313 ns 1.9371 ns] U256::shr time: [1.7386 ns 1.7402 ns 1.7424 ns] U256::ctlz time: [798.04 ps 798.59 ps 799.28 ps] U256::cttz time: [881.57 ps 882.25 ps 883.15 ps] U256::rotate_left time: [2.9088 ns 2.9195 ns 2.9321 ns] U256::rotate_right time: [2.7154 ns 2.7197 ns 2.7249 ns] and after: U256::add time: [1.6201 ns 1.6232 ns 1.6267 ns] U256::div (lo/lo) time: [8.1195 ns 8.1333 ns 8.1489 ns] U256::div (hi/lo) time: [19.281 ns 19.295 ns 19.309 ns] U256::div (hi/hi) time: [24.055 ns 24.095 ns 24.141 ns] U256::mul time: [2.6083 ns 2.6114 ns 2.6154 ns] U256::sub time: [1.6206 ns 1.6233 ns 1.6262 ns] U256::shl time: [1.9175 ns 1.9225 ns 1.9283 ns] U256::shr time: [1.7385 ns 1.7405 ns 1.7439 ns] U256::ctlz time: [799.83 ps 801.46 ps 803.33 ps] U256::cttz time: [884.29 ps 885.95 ps 887.80 ps] U256::rotate_left time: [2.9030 ns 2.9095 ns 2.9165 ns] U256::rotate_right time: [2.7006 ns 2.7032 ns 2.7068 ns] |
|
Previous post was on my desktop PC Intel i12900K. Below is the Graviton 3 (c7g amazon aws instance) bench: U256::div (lo/lo) time: [20.475 ns 20.477 ns 20.479 ns] U256::div (hi/lo) time: [94.612 ns 94.762 ns 94.916 ns] U256::div (hi/hi) time: [336.59 ns 336.61 ns 336.63 ns] U256::mul time: [5.5886 ns 5.5908 ns 5.5931 ns] U256::sub time: [5.4582 ns 5.4601 ns 5.4619 ns] U256::shl time: [4.2159 ns 4.2164 ns 4.2169 ns] U256::shr time: [4.1342 ns 4.1354 ns 4.1369 ns] U256::ctlz time: [2.4419 ns 2.4420 ns 2.4420 ns] U256::cttz time: [2.4857 ns 2.4858 ns 2.4858 ns] U256::rotate_left time: [5.7390 ns 5.7577 ns 5.7763 ns] U256::rotate_right time: [5.5316 ns 5.5386 ns 5.5461 ns] After: U256::add time: [5.4514 ns 5.4521 ns 5.4529 ns] U256::div (lo/lo) time: [20.474 ns 20.480 ns 20.485 ns] U256::div (hi/lo) time: [94.189 ns 94.202 ns 94.215 ns] U256::div (hi/hi) time: [68.181 ns 68.192 ns 68.204 ns] U256::mul time: [5.5856 ns 5.5879 ns 5.5902 ns] U256::sub time: [5.4708 ns 5.4722 ns 5.4736 ns] U256::shl time: [4.2157 ns 4.2162 ns 4.2167 ns] U256::shr time: [4.1333 ns 4.1340 ns 4.1348 ns] U256::ctlz time: [2.4419 ns 2.4420 ns 2.4421 ns] U256::cttz time: [2.4824 ns 2.4834 ns 2.4842 ns] U256::rotate_left time: [5.7638 ns 5.7770 ns 5.7908 ns] U256::rotate_right time: [5.5470 ns 5.5548 ns 5.5626 ns] |
|
Graviton 2 (c6g* amazon aws): U256::add time: [8.3676 ns 8.3697 ns 8.3718 ns] U256::div (lo/lo) time: [31.313 ns 31.330 ns 31.345 ns] U256::div (hi/lo) time: [142.58 ns 142.62 ns 142.66 ns] U256::div (hi/hi) time: [432.31 ns 432.42 ns 432.55 ns] U256::mul time: [21.649 ns 21.654 ns 21.661 ns] U256::sub time: [8.3902 ns 8.3923 ns 8.3945 ns] U256::shl time: [5.9219 ns 5.9233 ns 5.9248 ns] U256::shr time: [5.8867 ns 5.8869 ns 5.8871 ns] U256::ctlz time: [3.2917 ns 3.2931 ns 3.2947 ns] U256::cttz time: [3.2852 ns 3.2862 ns 3.2872 ns] U256::rotate_left time: [8.6338 ns 8.6409 ns 8.6480 ns] U256::rotate_right time: [8.5947 ns 8.5984 ns 8.6023 ns] After: U256::add time: [8.3776 ns 8.3796 ns 8.3818 ns] U256::div (lo/lo) time: [31.350 ns 31.386 ns 31.423 ns] U256::div (hi/lo) time: [143.55 ns 143.63 ns 143.72 ns] U256::div (hi/hi) time: [132.81 ns 132.88 ns 132.96 ns] U256::mul time: [21.653 ns 21.658 ns 21.664 ns] U256::sub time: [8.3743 ns 8.3750 ns 8.3757 ns] U256::shl time: [5.9165 ns 5.9166 ns 5.9168 ns] U256::shr time: [5.8862 ns 5.8863 ns 5.8865 ns] U256::ctlz time: [3.2861 ns 3.2862 ns 3.2863 ns] U256::cttz time: [3.2849 ns 3.2857 ns 3.2865 ns] U256::rotate_left time: [8.6495 ns 8.6562 ns 8.6631 ns] |
|
Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz U256::div (lo/lo) time: [34.656 ns 34.746 ns 34.849 ns] U256::div (hi/lo) time: [101.12 ns 101.34 ns 101.57 ns] U256::div (hi/hi) time: [260.59 ns 261.64 ns 263.05 ns] U256::mul time: [7.2249 ns 7.2486 ns 7.2787 ns] U256::sub time: [3.6266 ns 3.6375 ns 3.6507 ns] U256::shl time: [4.6821 ns 4.7187 ns 4.7582 ns] U256::shr time: [5.4234 ns 5.6343 ns 5.8779 ns] U256::ctlz time: [1.7979 ns 1.8071 ns 1.8172 ns] U256::cttz time: [1.6234 ns 1.6275 ns 1.6317 ns] U256::rotate_left time: [7.6074 ns 7.6350 ns 7.6708 ns] U256::rotate_right time: [6.9703 ns 6.9921 ns 7.0137 ns] After: U256::add time: [3.8436 ns 4.0444 ns 4.2842 ns] U256::div (lo/lo) time: [34.616 ns 34.722 ns 34.836 ns] U256::div (hi/lo) time: [102.35 ns 102.50 ns 102.67 ns] U256::div (hi/hi) time: [76.856 ns 80.566 ns 84.570 ns] U256::mul time: [7.2813 ns 7.3206 ns 7.3653 ns] U256::sub time: [3.6143 ns 3.6161 ns 3.6180 ns] U256::shl time: [4.9900 ns 5.2624 ns 5.5863 ns] U256::shr time: [5.2371 ns 5.3298 ns 5.4537 ns] U256::ctlz time: [1.9459 ns 2.0639 ns 2.2043 ns] U256::cttz time: [1.6350 ns 1.6419 ns 1.6494 ns] U256::rotate_left time: [7.7551 ns 7.7892 ns 7.8216 ns] U256::rotate_right time: [6.8823 ns 6.8993 ns 6.9163 ns] |
|
Please note that I did not test big endian arch. I don't know how to test it to be honest. I'm reasonably sure that div_mod_knuth is architecture-agnostic, but it is better to test. |
|
Sorry, just getting back to this now. The changes look great. I also ran the more comprehensive benchmarks from the |
|
I'm just going to run the fuzzer for a bit to see if there are any regressions with division, then this should be good to merge. |
Added div_mod_knuth division for two large >128bit numbers. Resolves #16.