Skip to content

Conversation

@TedLyngmo
Copy link
Contributor

@TedLyngmo TedLyngmo commented Sep 2, 2025

bitops.c: serverPopcount() used popcountAVX2(), which as the name implies requires AVX2 support, on AVX-only machines, causing an "illegal instruction" error.

Added a __builtin_cpu_supports("avx2") check and falling back to the platform agnostic version if AVX2 is not supported.

Fixes #2570

@codecov
Copy link

codecov bot commented Sep 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.22%. Comparing base (eebed88) to head (2b9e8e3).
⚠️ Report is 4 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2571      +/-   ##
============================================
+ Coverage     72.20%   72.22%   +0.02%     
============================================
  Files           126      126              
  Lines         70660    70660              
============================================
+ Hits          51017    51034      +17     
+ Misses        19643    19626      -17     
Files with missing lines Coverage Δ
src/bitops.c 94.08% <100.00%> (ø)

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ranshid
Copy link
Member

ranshid commented Sep 2, 2025

Trying to bring back my memory regarding SSE2 support. I am not sure SSE2 (which is something we use under the HAVE_X86_SIMD) require AVX2 support. Also per the bug you mentioned I think it will not really solve the problem in case we are using an AVX2 cross compiled binary on cpu which has no AVX2 support. I think a better fix would be to do the same thing we did for hyperloglog in #1293 where we make a runtime check on the CPU property supports avx2.

@ranshid ranshid self-requested a review September 2, 2025 15:42
@TedLyngmo
Copy link
Contributor Author

TedLyngmo commented Sep 2, 2025

Trying to bring back my memory regarding SSE2 support. I am not sure SSE2 (which is something we use under the HAVE_X86_SIMD) require AVX2 support.

I don't know. I only know that I got the illegal instruction error and that the server's CPU supports AVX but not AVX2. I found these intrinsic functions used in the commit that introduced the error:

 _mm256_add_epi64
 _mm256_add_epi8
 _mm256_and_si256
 _mm256_extract_epi64
 _mm256_loadu_si256
 _mm256_set1_epi8
 _mm256_setr_epi8
 _mm256_setzero_si256
 _mm256_shuffle_epi8
 _mm256_srli_epi16

Also per the bug you mentioned I think it will not really solve the problem in case we are using an AVX2 cross compiled binary on cpu which has no AVX2 support. I think a better fix would be to do the same thing we did for hyperloglog in #1293 where we make a runtime check on the CPU property supports avx2.

Sure, whatever works :-) Too bad everyone will have to suffer a runtime check every time these functions are going to be used though. It'll bloat the binary a bit too.

Btw: What cross compiler does not define __AVX2__ if the target supports it? That sounds like a flaw that should be reported.

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Sep 2, 2025

Well spotted. #1741 was released in 8.1 so we'll need to backport this fix to 8.1.

Please use a more descriptive text in the PRs, rather than only referring to an issue. When we merge, we squash-merge all the commits and use the PR title and description as the commit message. (We don't always need an issue btw. Only a PR is fine.)

Sure, whatever works :-) Too bad everyone will have to suffer a runtime check every time these functions are going to be used though. It'll bloat the binary a bit too.

Btw: What cross compiler does not define __AVX2__ if the target supports it? That sounds like a flaw that should be reported.

The same amd64 binary can be used by all amd64 machines, with or without various AVX support. Binary files are often distributed in distros and in containers. Thus, we can't rely entirely on compile-time checks.

There is a way to do this check only once at program startup though: GNU IFUNC resolver. @zhulipeng did that in this PR:

@zuiderkwast
Copy link
Contributor

If it's only for the BITCOUNT command, I suppose a runtime check is good enough though. It's just a single check in the execution of the command, which is probably negligible.

@TedLyngmo
Copy link
Contributor Author

Well spotted. #1741 was released in 8.1 so we'll need to backport this fix to 8.1.

Please use a more descriptive test in the PRs, rather than only referring to an issue. When we merge, we squash-merge all the commits and use the PR title and description as the commit message. (We don't always need an issue btw. Only a PR is fine.)

Ok, will do!

The same amd64 binary can be used by all amd64 machines, with or without various AVX support. Binary files are often distributed in distros and in containers. Thus, we can't rely entirely on compile-time checks.

I see! It didn't even occur to me that people would use it from distros rather than compiling it from source 😄

@zuiderkwast
Copy link
Contributor

How about adding this check in bitcount.c?

 long long serverPopcount(void *s, long count) {
 #if HAVE_X86_SIMD
     /* If length of s >= 256 bits and the CPU supports AVX2,
      * we prefer to use the SIMD version */
-    if (count >= 32) {
+    if (count >= 32 && __builtin_cpu_supports("avx2")) {
         return popcountAVX2(s, count);
     }
 #endif

@TedLyngmo
Copy link
Contributor Author

TedLyngmo commented Sep 2, 2025

How about adding this check in bitcount.c?

-    if (count >= 32) {
+    if (count >= 32 && __builtin_cpu_supports("avx2")) {

That seems to do the trick! I'll let all tests finish before updating the PR.

Edit: All tests passed now!

bitops.c: serverPopcount() used popcountAVX2(), which as the name
implies requires AVX2 support, on AVX-only machines, causing an
"illegal instruction" error.

Added a __builtin_cpu_supports("avx2") check and falling back to the
platform agnostic version if AVX2 is not supported.

Fixes valkey-io#2570

Signed-off-by: Ted Lyngmo <[email protected]>
Copy link
Member

@roshkhatri roshkhatri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/* If length of s >= 256 bits and the CPU supports AVX2,
* we prefer to use the SIMD version */
if (count >= 32) {
if (count >= 32 && __builtin_cpu_supports("avx2")) {
Copy link
Member

@ranshid ranshid Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since I do not expect avx2 support to be dynamically disabled on a CPU, you could use static storage for that and only calculate once when this function is called

Suggested change
if (count >= 32 && __builtin_cpu_supports("avx2")) {
static bool cpu_supports_avx2 = __builtin_cpu_supports("avx2");
if (count >= 32 && cpu_supports_avx2) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't work. I tested this with smaller file.

error: initializer element is not constant
    5 |     static bool cpu_supports_avx2 = __builtin_cpu_supports("avx2");
      |                                     ^~~~~~~~~~~~~~~~~~~~~~

Copy link
Contributor Author

@TedLyngmo TedLyngmo Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the builtin results in test BYTE PTR __cpu_model[rip+13], 4 (https://godbolt.org/z/61vxW7de5), so the assembly expansion of anything __cpuid -related doesn't seem to be there. I can't say for sure if this test is cheaper than using a call_once construct like this: https://godbolt.org/z/TfoG35f13, but the call_once version feels more expensive.

If we could use non-constant initializers for globals, like in C++, the test would become a cmp: https://godbolt.org/z/cxe5M76aM

If the bitwise test is more expensive than cmp we could use __attribute__((constructor)) if that's supported by all implementations valkey aims to support: https://godbolt.org/z/j7oY9qqjY

... but realistically, I can't think of a way to get it any cheaper than what's currently in this PR without trickery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Yes, I believe the CPU id structures are initialized once (by __builtin_cpu_init()) to make the runtime checks faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last idea: We could split the function in multiple implementations to get rid of the runtime check and initialize a function pointer instead: https://godbolt.org/z/713srTrcb
I'm unsure if dereferencing that is cheaper than the test or cmp though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's basically the IFUNC resolver approach. We did it for string2ll() in #2099 because it's called many times for every command, e.g. when parsing the RESP command arguments.

Since we need to backport this fix, it's good to keep it minimal.

We can perhaps do the resolver function in a separate PR, if the performance gain is worth the added complexity.

Copy link
Member

@ranshid ranshid Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. we can wait with that.

also we can just calculate it once in a constructor function and it would also be helpful for other cases:

static bool cpu_supports_avx2;

__attribute__((constructor))
static void init_cpu_flags(void) {
    cpu_supports_avx2 = __builtin_cpu_supports("avx2");
}

or call it from main...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh @TedLyngmo I see you already referred to this proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__builtin_cpu_supports("avx2") expands to the same assembly as checking a single variable, so it's just as cheap. It can be seen in the godbolt links posted above.

@zuiderkwast zuiderkwast merged commit 971cdb7 into valkey-io:unstable Sep 3, 2025
52 checks passed
@github-project-automation github-project-automation bot moved this to To be backported in Valkey 8.1 Sep 3, 2025
@TedLyngmo TedLyngmo deleted the avx2 branch September 3, 2025 17:03
@zuiderkwast zuiderkwast added the release-notes This issue should get a line item in the release notes label Sep 4, 2025
rjd15372 pushed a commit to rjd15372/valkey that referenced this pull request Sep 19, 2025
`bitops.c`: `serverPopcount()` used `popcountAVX2()`, which as the name
implies requires AVX2 support, on AVX-only machines, causing an "illegal
instruction" error.

Added a `__builtin_cpu_supports("avx2")` check and falling back to the
platform agnostic version if AVX2 is not supported.

Fixes valkey-io#2570

Signed-off-by: Ted Lyngmo <[email protected]>
rjd15372 pushed a commit that referenced this pull request Sep 23, 2025
`bitops.c`: `serverPopcount()` used `popcountAVX2()`, which as the name
implies requires AVX2 support, on AVX-only machines, causing an "illegal
instruction" error.

Added a `__builtin_cpu_supports("avx2")` check and falling back to the
platform agnostic version if AVX2 is not supported.

Fixes #2570

Signed-off-by: Ted Lyngmo <[email protected]>
@ranshid ranshid moved this from To be backported to In Progress in Valkey 8.1 Sep 30, 2025
ranshid pushed a commit to ranshid/valkey that referenced this pull request Sep 30, 2025
`bitops.c`: `serverPopcount()` used `popcountAVX2()`, which as the name
implies requires AVX2 support, on AVX-only machines, causing an "illegal
instruction" error.

Added a `__builtin_cpu_supports("avx2")` check and falling back to the
platform agnostic version if AVX2 is not supported.

Fixes valkey-io#2570

Signed-off-by: Ted Lyngmo <[email protected]>
ranshid pushed a commit to ranshid/valkey that referenced this pull request Sep 30, 2025
`bitops.c`: `serverPopcount()` used `popcountAVX2()`, which as the name
implies requires AVX2 support, on AVX-only machines, causing an "illegal
instruction" error.

Added a `__builtin_cpu_supports("avx2")` check and falling back to the
platform agnostic version if AVX2 is not supported.

Fixes valkey-io#2570

Signed-off-by: Ted Lyngmo <[email protected]>
@ranshid ranshid moved this from In Progress to 8.1.4 in Valkey 8.1 Sep 30, 2025
@ranshid ranshid moved this from 8.1.4 to To be backported in Valkey 8.1 Sep 30, 2025
zuiderkwast pushed a commit that referenced this pull request Oct 1, 2025
`bitops.c`: `serverPopcount()` used `popcountAVX2()`, which as the name
implies requires AVX2 support, on AVX-only machines, causing an "illegal
instruction" error.

Added a `__builtin_cpu_supports("avx2")` check and falling back to the
platform agnostic version if AVX2 is not supported.

Fixes #2570

Signed-off-by: Ted Lyngmo <[email protected]>
@zuiderkwast zuiderkwast moved this from To be backported to 8.1.4 in Valkey 8.1 Oct 1, 2025
hpatro pushed a commit to hpatro/valkey that referenced this pull request Oct 3, 2025
`bitops.c`: `serverPopcount()` used `popcountAVX2()`, which as the name
implies requires AVX2 support, on AVX-only machines, causing an "illegal
instruction" error.

Added a `__builtin_cpu_supports("avx2")` check and falling back to the
platform agnostic version if AVX2 is not supported.

Fixes valkey-io#2570

Signed-off-by: Ted Lyngmo <[email protected]>
Signed-off-by: Harkrishn Patro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes

Projects

Status: 8.1.4
Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] Illegal instruction executed

5 participants