Skip to content

Add Stable Div Rem#32

Merged
nlordell merged 8 commits into
nlordell:mainfrom
NCGThompson:div-rem
Apr 16, 2025
Merged

Add Stable Div Rem#32
nlordell merged 8 commits into
nlordell:mainfrom
NCGThompson:div-rem

Conversation

@NCGThompson

Copy link
Copy Markdown
Contributor

closes #11

@NCGThompson NCGThompson marked this pull request as ready for review November 13, 2023 04:29
@NCGThompson

Copy link
Copy Markdown
Contributor Author

You may notice that I included a compiler hint for in some of the divide by zero checks. e.g.:

https://github.com/NCGThompson/ethnum-rs/blob/295dd6d0a0743026c3ca593e750b0268b65e4357/src/uint.rs#L372-L386

Inequality operators (e.g. <=, >) are overloaded with the cmp method, which has a custom implementation based on the lexicographical comparison of (u128, u128):

https://github.com/NCGThompson/ethnum-rs/blob/295dd6d0a0743026c3ca593e750b0268b65e4357/src/uint/cmp.rs#L18-L20

The eq method, on the other hand, is derived automatically. It performs the eq method on each of U256's fields, which is a single array of u128s.

In LLVM, the lexicographical comparison is translated to two icmps, while the eq method is translated to a bcmp. The bcmp is very fast, and can take advantage of the SIMD registers. When compared to a constant 0 on x86_64 with 256-bit registers, bcmp translates to test ymmm, ymmm. icmp is not optimized the same way.

However, the compiler seems to inconsistently elide redundant bcmps, even if they are compiled in the same module. For example,

if a == 0 {
    unsafe { unreachable_unchecked() }
}
if a == 0 {
    panic!();
}

compiles to a single bcmp with all optimizations on. Also, the compiler doesn't understand that the result of the bcmp indicates the result of the icmp of each of the u128s. Notably, the / expressions used in udivmod4 use icmp, so

if a > u128::MAX {
    unsafe { unreachable_unchecked() }
}
if a == 0 {
    unsafe { unreachable_unchecked() }
}
if a == 0 {
    panic!();
}
_ = U256::new(3) / a;

compiles to a 256-bit bcmp and a 128-bit icmp each with their own panic statement, even if udivmod4 is inlined and optimized.

So, you may ask, why not use a custom implementation of eq that includes the compiler hint? I definitely think that is worth considering, but the effect varies based on the context as well as the opt settings. I don't know if it will be a pessimization in some cases. Interestingly, if lto is not set to "fat", then it makes the biggest difference when comparing a 256-bit to a 128-bit and the hint is placed directly in the self.eq(128) rather than a custom self.eq(Self).

Comment thread src/uint.rs Outdated

@nlordell nlordell left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Thanks so much for this, just a small nit on the alignment.

@NCGThompson NCGThompson marked this pull request as draft November 17, 2023 16:43
@NCGThompson

Copy link
Copy Markdown
Contributor Author

Fixed the alignment. The other change I made was to the docs and doc tests of some methods. One could argue it was better before, but this way makes it more consistent with std.

@NCGThompson NCGThompson marked this pull request as ready for review November 17, 2023 18:58
@NCGThompson

Copy link
Copy Markdown
Contributor Author

I found an error. In the std library, overflow is a guaranteed panic even without overflow checks. I made this depend on debug assertions.

@nlordell nlordell merged commit cbb3860 into nlordell:main Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add {checked_,overflowing_,wrapping_,}div_rem

2 participants