-
Notifications
You must be signed in to change notification settings - Fork 955
Optimize bitcount command by SIMD #1741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4ba7902 to
8da4334
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #1741 +/- ##
=========================================
Coverage 71.08% 71.08%
=========================================
Files 123 123
Lines 65552 65583 +31
=========================================
+ Hits 46596 46622 +26
- Misses 18956 18961 +5
|
c2b2c1f to
0eedd92
Compare
|
Can you add tests to make sure the same results are achieved, with, and without AVX? |
|
@xbasel Sure! If there is anything I can help, please let me know. |
there was a typo :), I asked whether you can add tests (unless you think the existing tests are enough) |
No problem, I will review the tests to ensure they are enough and get back to you. |
9b9a8aa to
b420ef4
Compare
Test code to check scalar/vector result consistency has been committed. |
a5dc330 to
1cdeebe
Compare
2a05fef to
a3a31cf
Compare
zuiderkwast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very good. Just some minor comments.
First I didn't understand anything of this AVX2 code, but I found the paper and I realized this was not invented in this PR. 😄
aaa211a to
16fd90a
Compare
zuiderkwast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Optimize bitcount command by SIMD Signed-off-by: chzhoo <[email protected]>
Fix complie warning Signed-off-by: chzhoo <[email protected]>
Forget include of stdint.h Signed-off-by: chzhoo <[email protected]>
Optimize unit test error messages Signed-off-by: chzhoo <[email protected]>
Add citations to the original author's work Signed-off-by: chzhoo <[email protected]>
097a241 to
114b338
Compare
The unit test was added in #1741. It fails when compiled with UBSAN. Using a local array like `char buf[size]` is undefined behaviour when `size == 0`. This fix just makes it size 1 in this case. The failure I got locally: unit/test_bitops.c:28:13: runtime error: variable length array bound evaluates to non-positive value 0 Signed-off-by: Viktor Söderqvist <[email protected]>
In a recent PR valkey-io#1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, it fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. Signed-off-by: Ricardo Dias <[email protected]>
In a recent PR #1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: Ricardo Dias <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
**Background** - Currently, we implement bitcount using a lookup table method - By SIMD, parallel table lookups can be achieved, which boosts performance - Most x86 servers support the AVX2 instruction set **BenchMark** | Value Size | QPS (After optimization) | QPS (Before optimization) | change | | ---- | ---- | ---- | ---- | |16 B | 114925| 115924 | -0.8%| |256 B| 112619 | 112201| +0.3%| |4 KB| 105523|96251| +9.6%| |64 KB|79723|36796| +116%| |1MB|21306|3466|+514%| CPU: AMD EPYC 9754 128-Core Processor * 8 OS: Ubuntu Server 22.04 LTS 64bit Memory: 16GB VM: Tencent cloud SA5.2XLARGE16 **Test Plan** Pending. Will add test if it looks okay **Other** This PR is based on https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp --------- Signed-off-by: chzhoo <[email protected]>
The unit test was added in #1741. It fails when compiled with UBSAN. Using a local array like `char buf[size]` is undefined behaviour when `size == 0`. This fix just makes it size 1 in this case. The failure I got locally: unit/test_bitops.c:28:13: runtime error: variable length array bound evaluates to non-positive value 0 Signed-off-by: Viktor Söderqvist <[email protected]>
In a recent PR #1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: Ricardo Dias <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
**Background** - Currently, we implement bitcount using a lookup table method - By SIMD, parallel table lookups can be achieved, which boosts performance - Most x86 servers support the AVX2 instruction set **BenchMark** | Value Size | QPS (After optimization) | QPS (Before optimization) | change | | ---- | ---- | ---- | ---- | |16 B | 114925| 115924 | -0.8%| |256 B| 112619 | 112201| +0.3%| |4 KB| 105523|96251| +9.6%| |64 KB|79723|36796| +116%| |1MB|21306|3466|+514%| CPU: AMD EPYC 9754 128-Core Processor * 8 OS: Ubuntu Server 22.04 LTS 64bit Memory: 16GB VM: Tencent cloud SA5.2XLARGE16 **Test Plan** Pending. Will add test if it looks okay **Other** This PR is based on https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp --------- Signed-off-by: chzhoo <[email protected]>
The unit test was added in valkey-io#1741. It fails when compiled with UBSAN. Using a local array like `char buf[size]` is undefined behaviour when `size == 0`. This fix just makes it size 1 in this case. The failure I got locally: unit/test_bitops.c:28:13: runtime error: variable length array bound evaluates to non-positive value 0 Signed-off-by: Viktor Söderqvist <[email protected]>
In a recent PR valkey-io#1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: Ricardo Dias <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
**Background** - Currently, we implement bitcount using a lookup table method - By SIMD, parallel table lookups can be achieved, which boosts performance - Most x86 servers support the AVX2 instruction set **BenchMark** | Value Size | QPS (After optimization) | QPS (Before optimization) | change | | ---- | ---- | ---- | ---- | |16 B | 114925| 115924 | -0.8%| |256 B| 112619 | 112201| +0.3%| |4 KB| 105523|96251| +9.6%| |64 KB|79723|36796| +116%| |1MB|21306|3466|+514%| CPU: AMD EPYC 9754 128-Core Processor * 8 OS: Ubuntu Server 22.04 LTS 64bit Memory: 16GB VM: Tencent cloud SA5.2XLARGE16 **Test Plan** Pending. Will add test if it looks okay **Other** This PR is based on https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp --------- Signed-off-by: chzhoo <[email protected]>
The unit test was added in valkey-io#1741. It fails when compiled with UBSAN. Using a local array like `char buf[size]` is undefined behaviour when `size == 0`. This fix just makes it size 1 in this case. The failure I got locally: unit/test_bitops.c:28:13: runtime error: variable length array bound evaluates to non-positive value 0 Signed-off-by: Viktor Söderqvist <[email protected]>
In a recent PR valkey-io#1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: Ricardo Dias <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
In a recent PR valkey-io#1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: Ricardo Dias <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Signed-off-by: Shai Zarka <[email protected]>
**Background** - Currently, we implement bitcount using a lookup table method - By SIMD, parallel table lookups can be achieved, which boosts performance - Most x86 servers support the AVX2 instruction set **BenchMark** | Value Size | QPS (After optimization) | QPS (Before optimization) | change | | ---- | ---- | ---- | ---- | |16 B | 114925| 115924 | -0.8%| |256 B| 112619 | 112201| +0.3%| |4 KB| 105523|96251| +9.6%| |64 KB|79723|36796| +116%| |1MB|21306|3466|+514%| CPU: AMD EPYC 9754 128-Core Processor * 8 OS: Ubuntu Server 22.04 LTS 64bit Memory: 16GB VM: Tencent cloud SA5.2XLARGE16 **Test Plan** Pending. Will add test if it looks okay **Other** This PR is based on https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp --------- Signed-off-by: chzhoo <[email protected]>
The unit test was added in valkey-io#1741. It fails when compiled with UBSAN. Using a local array like `char buf[size]` is undefined behaviour when `size == 0`. This fix just makes it size 1 in this case. The failure I got locally: unit/test_bitops.c:28:13: runtime error: variable length array bound evaluates to non-positive value 0 Signed-off-by: Viktor Söderqvist <[email protected]>
In a recent PR valkey-io#1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: Ricardo Dias <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
Background
BenchMark
CPU: AMD EPYC 9754 128-Core Processor * 8
OS: Ubuntu Server 22.04 LTS 64bit
Memory: 16GB
VM: Tencent cloud SA5.2XLARGE16
Test Plan
Pending. Will add test if it looks okay
Other
This PR is based on https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp