-
Notifications
You must be signed in to change notification settings - Fork 955
Don't use AVX2 instructions if the CPU don't support it #2571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2571 +/- ##
============================================
+ Coverage 72.20% 72.22% +0.02%
============================================
Files 126 126
Lines 70660 70660
============================================
+ Hits 51017 51034 +17
+ Misses 19643 19626 -17
🚀 New features to boost your workflow:
|
|
Trying to bring back my memory regarding SSE2 support. I am not sure SSE2 (which is something we use under the HAVE_X86_SIMD) require AVX2 support. Also per the bug you mentioned I think it will not really solve the problem in case we are using an AVX2 cross compiled binary on cpu which has no AVX2 support. I think a better fix would be to do the same thing we did for hyperloglog in #1293 where we make a runtime check on the CPU property supports avx2. |
I don't know. I only know that I got the illegal instruction error and that the server's CPU supports AVX but not AVX2. I found these intrinsic functions used in the commit that introduced the error:
Sure, whatever works :-) Too bad everyone will have to suffer a runtime check every time these functions are going to be used though. It'll bloat the binary a bit too. Btw: What cross compiler does not define |
|
Well spotted. #1741 was released in 8.1 so we'll need to backport this fix to 8.1. Please use a more descriptive text in the PRs, rather than only referring to an issue. When we merge, we squash-merge all the commits and use the PR title and description as the commit message. (We don't always need an issue btw. Only a PR is fine.)
The same amd64 binary can be used by all amd64 machines, with or without various AVX support. Binary files are often distributed in distros and in containers. Thus, we can't rely entirely on compile-time checks. There is a way to do this check only once at program startup though: GNU IFUNC resolver. @zhulipeng did that in this PR: |
|
If it's only for the BITCOUNT command, I suppose a runtime check is good enough though. It's just a single check in the execution of the command, which is probably negligible. |
Ok, will do!
I see! It didn't even occur to me that people would use it from distros rather than compiling it from source 😄 |
|
How about adding this check in bitcount.c? long long serverPopcount(void *s, long count) {
#if HAVE_X86_SIMD
/* If length of s >= 256 bits and the CPU supports AVX2,
* we prefer to use the SIMD version */
- if (count >= 32) {
+ if (count >= 32 && __builtin_cpu_supports("avx2")) {
return popcountAVX2(s, count);
}
#endif |
That seems to do the trick! I'll let all tests finish before updating the PR. Edit: All tests passed now! |
bitops.c: serverPopcount() used popcountAVX2(), which as the name
implies requires AVX2 support, on AVX-only machines, causing an
"illegal instruction" error.
Added a __builtin_cpu_supports("avx2") check and falling back to the
platform agnostic version if AVX2 is not supported.
Fixes valkey-io#2570
Signed-off-by: Ted Lyngmo <[email protected]>
roshkhatri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| /* If length of s >= 256 bits and the CPU supports AVX2, | ||
| * we prefer to use the SIMD version */ | ||
| if (count >= 32) { | ||
| if (count >= 32 && __builtin_cpu_supports("avx2")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since I do not expect avx2 support to be dynamically disabled on a CPU, you could use static storage for that and only calculate once when this function is called
| if (count >= 32 && __builtin_cpu_supports("avx2")) { | |
| static bool cpu_supports_avx2 = __builtin_cpu_supports("avx2"); | |
| if (count >= 32 && cpu_supports_avx2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't work. I tested this with smaller file.
error: initializer element is not constant
5 | static bool cpu_supports_avx2 = __builtin_cpu_supports("avx2");
| ^~~~~~~~~~~~~~~~~~~~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the builtin results in test BYTE PTR __cpu_model[rip+13], 4 (https://godbolt.org/z/61vxW7de5), so the assembly expansion of anything __cpuid -related doesn't seem to be there. I can't say for sure if this test is cheaper than using a call_once construct like this: https://godbolt.org/z/TfoG35f13, but the call_once version feels more expensive.
If we could use non-constant initializers for globals, like in C++, the test would become a cmp: https://godbolt.org/z/cxe5M76aM
If the bitwise test is more expensive than cmp we could use __attribute__((constructor)) if that's supported by all implementations valkey aims to support: https://godbolt.org/z/j7oY9qqjY
... but realistically, I can't think of a way to get it any cheaper than what's currently in this PR without trickery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Yes, I believe the CPU id structures are initialized once (by __builtin_cpu_init()) to make the runtime checks faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last idea: We could split the function in multiple implementations to get rid of the runtime check and initialize a function pointer instead: https://godbolt.org/z/713srTrcb
I'm unsure if dereferencing that is cheaper than the test or cmp though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's basically the IFUNC resolver approach. We did it for string2ll() in #2099 because it's called many times for every command, e.g. when parsing the RESP command arguments.
Since we need to backport this fix, it's good to keep it minimal.
We can perhaps do the resolver function in a separate PR, if the performance gain is worth the added complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. we can wait with that.
also we can just calculate it once in a constructor function and it would also be helpful for other cases:
static bool cpu_supports_avx2;
__attribute__((constructor))
static void init_cpu_flags(void) {
cpu_supports_avx2 = __builtin_cpu_supports("avx2");
}
or call it from main...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh @TedLyngmo I see you already referred to this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__builtin_cpu_supports("avx2") expands to the same assembly as checking a single variable, so it's just as cheap. It can be seen in the godbolt links posted above.
`bitops.c`: `serverPopcount()` used `popcountAVX2()`, which as the name
implies requires AVX2 support, on AVX-only machines, causing an "illegal
instruction" error.
Added a `__builtin_cpu_supports("avx2")` check and falling back to the
platform agnostic version if AVX2 is not supported.
Fixes valkey-io#2570
Signed-off-by: Ted Lyngmo <[email protected]>
`bitops.c`: `serverPopcount()` used `popcountAVX2()`, which as the name
implies requires AVX2 support, on AVX-only machines, causing an "illegal
instruction" error.
Added a `__builtin_cpu_supports("avx2")` check and falling back to the
platform agnostic version if AVX2 is not supported.
Fixes #2570
Signed-off-by: Ted Lyngmo <[email protected]>
`bitops.c`: `serverPopcount()` used `popcountAVX2()`, which as the name
implies requires AVX2 support, on AVX-only machines, causing an "illegal
instruction" error.
Added a `__builtin_cpu_supports("avx2")` check and falling back to the
platform agnostic version if AVX2 is not supported.
Fixes valkey-io#2570
Signed-off-by: Ted Lyngmo <[email protected]>
`bitops.c`: `serverPopcount()` used `popcountAVX2()`, which as the name
implies requires AVX2 support, on AVX-only machines, causing an "illegal
instruction" error.
Added a `__builtin_cpu_supports("avx2")` check and falling back to the
platform agnostic version if AVX2 is not supported.
Fixes valkey-io#2570
Signed-off-by: Ted Lyngmo <[email protected]>
`bitops.c`: `serverPopcount()` used `popcountAVX2()`, which as the name
implies requires AVX2 support, on AVX-only machines, causing an "illegal
instruction" error.
Added a `__builtin_cpu_supports("avx2")` check and falling back to the
platform agnostic version if AVX2 is not supported.
Fixes #2570
Signed-off-by: Ted Lyngmo <[email protected]>
`bitops.c`: `serverPopcount()` used `popcountAVX2()`, which as the name
implies requires AVX2 support, on AVX-only machines, causing an "illegal
instruction" error.
Added a `__builtin_cpu_supports("avx2")` check and falling back to the
platform agnostic version if AVX2 is not supported.
Fixes valkey-io#2570
Signed-off-by: Ted Lyngmo <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
bitops.c:serverPopcount()usedpopcountAVX2(), which as the name implies requires AVX2 support, on AVX-only machines, causing an "illegal instruction" error.Added a
__builtin_cpu_supports("avx2")check and falling back to the platform agnostic version if AVX2 is not supported.Fixes #2570