Skip to content

Conversation

@chzhoo
Copy link
Contributor

@chzhoo chzhoo commented Feb 15, 2025

Background

  • Currently, we implement bitcount using a lookup table method
  • By SIMD, parallel table lookups can be achieved, which boosts performance
  • Most x86 servers support the AVX2 instruction set

BenchMark

Value Size QPS (After optimization) QPS (Before optimization) change
16 B 114925 115924 -0.8%
256 B 112619 112201 +0.3%
4 KB 105523 96251 +9.6%
64 KB 79723 36796 +116%
1MB 21306 3466 +514%

CPU: AMD EPYC 9754 128-Core Processor * 8
OS: Ubuntu Server 22.04 LTS 64bit
Memory: 16GB
VM: Tencent cloud SA5.2XLARGE16

Test Plan
Pending. Will add test if it looks okay

Other
This PR is based on https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp

@chzhoo chzhoo force-pushed the simd_popcount branch 2 times, most recently from 4ba7902 to 8da4334 Compare February 15, 2025 13:39
@codecov
Copy link

codecov bot commented Feb 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.08%. Comparing base (3a64260) to head (114b338).
Report is 10 commits behind head on unstable.

Additional details and impacted files
@@            Coverage Diff            @@
##           unstable    #1741   +/-   ##
=========================================
  Coverage     71.08%   71.08%           
=========================================
  Files           123      123           
  Lines         65552    65583   +31     
=========================================
+ Hits          46596    46622   +26     
- Misses        18956    18961    +5     
Files with missing lines Coverage Δ
src/bitops.c 90.00% <100.00%> (-3.63%) ⬇️

... and 12 files with indirect coverage changes

@chzhoo chzhoo force-pushed the simd_popcount branch 3 times, most recently from c2b2c1f to 0eedd92 Compare February 15, 2025 17:23
@xbasel xbasel self-requested a review February 17, 2025 11:58
@xbasel
Copy link
Member

xbasel commented Feb 17, 2025

Can you add tests to make sure the same results are achieved, with, and without AVX?

@chzhoo
Copy link
Contributor Author

chzhoo commented Feb 17, 2025

@xbasel Sure! If there is anything I can help, please let me know.

@xbasel
Copy link
Member

xbasel commented Feb 17, 2025

@xbasel Sure! If there is anything I can help, please let me know.

there was a typo :), I asked whether you can add tests (unless you think the existing tests are enough)

@chzhoo
Copy link
Contributor Author

chzhoo commented Feb 17, 2025

@xbasel Sure! If there is anything I can help, please let me know.

there was a typo :), I asked whether you can add tests (unless you think the existing tests are enough)

No problem, I will review the tests to ensure they are enough and get back to you.

@chzhoo
Copy link
Contributor Author

chzhoo commented Feb 18, 2025

Can you add tests to make sure the same results are achieved, with, and without AVX?

Test code to check scalar/vector result consistency has been committed.

@chzhoo chzhoo force-pushed the simd_popcount branch 3 times, most recently from a5dc330 to 1cdeebe Compare February 18, 2025 17:19
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very good. Just some minor comments.

First I didn't understand anything of this AVX2 code, but I found the paper and I realized this was not invented in this PR. 😄

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Optimize bitcount command by SIMD

Signed-off-by: chzhoo <[email protected]>
Fix complie warning

Signed-off-by: chzhoo <[email protected]>
Forget include of stdint.h

Signed-off-by: chzhoo <[email protected]>
Optimize unit test error messages

Signed-off-by: chzhoo <[email protected]>
Add citations to the original author's work

Signed-off-by: chzhoo <[email protected]>
@zuiderkwast zuiderkwast added the release-notes This issue should get a line item in the release notes label Feb 25, 2025
@zuiderkwast zuiderkwast merged commit 79d5047 into valkey-io:unstable Feb 26, 2025
51 checks passed
zuiderkwast added a commit that referenced this pull request Feb 26, 2025
The unit test was added in #1741. It fails when compiled with UBSAN.
Using a local array like `char buf[size]` is undefined behaviour when
`size == 0`. This fix just makes it size 1 in this case.

The failure I got locally:

unit/test_bitops.c:28:13: runtime error: variable length array bound
evaluates to non-positive value 0

Signed-off-by: Viktor Söderqvist <[email protected]>
@chzhoo chzhoo deleted the simd_popcount branch February 27, 2025 04:55
rjd15372 added a commit to rjd15372/valkey that referenced this pull request Feb 27, 2025
In a recent PR valkey-io#1741 the new
header `<immintrin.h>` was added, which transitively includes
`<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes
a call to the `malloc` function.

The Valkey server code explicitly sets the malloc function as a
deprecated function in `server.h`:
```c
void *malloc(size_t size) __attribute__((deprecated));
```

The Valkey server code is then compiled with
`-Werror=deprecated-declarations` option to detect the uses of
deprecated functions like `malloc`, and due to this, when the `bitops.c`
file is compiled with Clang, it fails with the following error:

```
In file included from bitops.c:33:
In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26:
In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31:
/usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations]
   35 |     return malloc(__size);
      |            ^
./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here
 3874 | void *malloc(size_t size) __attribute__((deprecated));
```

There is a difference in behavior though, between GCC and Clang. The
`bitops.c` file compiles successfully with GCC.

I don't know exactly why GCC does not issue a warning in this case. My
best guess is that GCC does not issue warnings from code of the standard
library.

To fix the build error in clang, we explicitly use `pragma` macro to
tell clang to ignore deprecated declarations warnings in `bitops.c`.

Signed-off-by: Ricardo Dias <[email protected]>
hwware pushed a commit that referenced this pull request Feb 27, 2025
In a recent PR #1741 the new
header `<immintrin.h>` was added, which transitively includes
`<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes
a call to the `malloc` function.

The Valkey server code explicitly sets the malloc function as a
deprecated function in `server.h`:
```c
void *malloc(size_t size) __attribute__((deprecated));
```

The Valkey server code is then compiled with
`-Werror=deprecated-declarations` option to detect the uses of
deprecated functions like `malloc`, and due to this, when the `bitops.c`
file is compiled with Clang, fails with the following error:

```
In file included from bitops.c:33:
In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26:
In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31:
/usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations]
   35 |     return malloc(__size);
      |            ^
./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here
 3874 | void *malloc(size_t size) __attribute__((deprecated));
```

There is a difference in behavior though, between GCC and Clang. The
`bitops.c` file compiles successfully with GCC.

I don't know exactly why GCC does not issue a warning in this case. My
best guess is that GCC does not issue warnings from code of the standard
library.

To fix the build error in clang, we explicitly use `pragma` macro to
tell clang to ignore deprecated declarations warnings in `bitops.c`.

---------

Signed-off-by: Ricardo Dias <[email protected]>
Signed-off-by: Ricardo Dias <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
zuiderkwast pushed a commit that referenced this pull request Mar 18, 2025
**Background**

- Currently, we implement bitcount using a lookup table method
- By SIMD, parallel table lookups can be achieved, which boosts
performance
- Most x86 servers support the AVX2 instruction set

**BenchMark**
| Value Size | QPS (After optimization) | QPS (Before optimization) |
change |
| ---- | ---- | ---- | ---- |
|16 B | 114925| 115924 |  -0.8%|
|256 B| 112619 | 112201|  +0.3%|
|4 KB| 105523|96251| +9.6%|
|64 KB|79723|36796| +116%|
|1MB|21306|3466|+514%|

CPU: AMD EPYC 9754 128-Core Processor * 8
OS:   Ubuntu Server 22.04 LTS 64bit
Memory: 16GB
VM: Tencent cloud SA5.2XLARGE16

**Test Plan**
Pending. Will add test if it looks okay

**Other**
This PR is based on
https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp

---------

Signed-off-by: chzhoo <[email protected]>
zuiderkwast added a commit that referenced this pull request Mar 18, 2025
The unit test was added in #1741. It fails when compiled with UBSAN.
Using a local array like `char buf[size]` is undefined behaviour when
`size == 0`. This fix just makes it size 1 in this case.

The failure I got locally:

unit/test_bitops.c:28:13: runtime error: variable length array bound
evaluates to non-positive value 0

Signed-off-by: Viktor Söderqvist <[email protected]>
zuiderkwast pushed a commit that referenced this pull request Mar 18, 2025
In a recent PR #1741 the new
header `<immintrin.h>` was added, which transitively includes
`<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes
a call to the `malloc` function.

The Valkey server code explicitly sets the malloc function as a
deprecated function in `server.h`:
```c
void *malloc(size_t size) __attribute__((deprecated));
```

The Valkey server code is then compiled with
`-Werror=deprecated-declarations` option to detect the uses of
deprecated functions like `malloc`, and due to this, when the `bitops.c`
file is compiled with Clang, fails with the following error:

```
In file included from bitops.c:33:
In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26:
In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31:
/usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations]
   35 |     return malloc(__size);
      |            ^
./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here
 3874 | void *malloc(size_t size) __attribute__((deprecated));
```

There is a difference in behavior though, between GCC and Clang. The
`bitops.c` file compiles successfully with GCC.

I don't know exactly why GCC does not issue a warning in this case. My
best guess is that GCC does not issue warnings from code of the standard
library.

To fix the build error in clang, we explicitly use `pragma` macro to
tell clang to ignore deprecated declarations warnings in `bitops.c`.

---------

Signed-off-by: Ricardo Dias <[email protected]>
Signed-off-by: Ricardo Dias <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
xbasel pushed a commit to xbasel/valkey that referenced this pull request Mar 27, 2025
**Background**

- Currently, we implement bitcount using a lookup table method
- By SIMD, parallel table lookups can be achieved, which boosts
performance
- Most x86 servers support the AVX2 instruction set

**BenchMark**
| Value Size | QPS (After optimization) | QPS (Before optimization) |
change |
| ---- | ---- | ---- | ---- |
|16 B | 114925| 115924 |  -0.8%|
|256 B| 112619 | 112201|  +0.3%|
|4 KB| 105523|96251| +9.6%|
|64 KB|79723|36796| +116%|
|1MB|21306|3466|+514%|

CPU: AMD EPYC 9754 128-Core Processor * 8
OS:   Ubuntu Server 22.04 LTS 64bit
Memory: 16GB
VM: Tencent cloud SA5.2XLARGE16

**Test Plan**
Pending. Will add test if it looks okay

**Other**
This PR is based on
https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp

---------

Signed-off-by: chzhoo <[email protected]>
xbasel pushed a commit to xbasel/valkey that referenced this pull request Mar 27, 2025
The unit test was added in valkey-io#1741. It fails when compiled with UBSAN.
Using a local array like `char buf[size]` is undefined behaviour when
`size == 0`. This fix just makes it size 1 in this case.

The failure I got locally:

unit/test_bitops.c:28:13: runtime error: variable length array bound
evaluates to non-positive value 0

Signed-off-by: Viktor Söderqvist <[email protected]>
xbasel pushed a commit to xbasel/valkey that referenced this pull request Mar 27, 2025
In a recent PR valkey-io#1741 the new
header `<immintrin.h>` was added, which transitively includes
`<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes
a call to the `malloc` function.

The Valkey server code explicitly sets the malloc function as a
deprecated function in `server.h`:
```c
void *malloc(size_t size) __attribute__((deprecated));
```

The Valkey server code is then compiled with
`-Werror=deprecated-declarations` option to detect the uses of
deprecated functions like `malloc`, and due to this, when the `bitops.c`
file is compiled with Clang, fails with the following error:

```
In file included from bitops.c:33:
In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26:
In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31:
/usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations]
   35 |     return malloc(__size);
      |            ^
./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here
 3874 | void *malloc(size_t size) __attribute__((deprecated));
```

There is a difference in behavior though, between GCC and Clang. The
`bitops.c` file compiles successfully with GCC.

I don't know exactly why GCC does not issue a warning in this case. My
best guess is that GCC does not issue warnings from code of the standard
library.

To fix the build error in clang, we explicitly use `pragma` macro to
tell clang to ignore deprecated declarations warnings in `bitops.c`.

---------

Signed-off-by: Ricardo Dias <[email protected]>
Signed-off-by: Ricardo Dias <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
xbasel pushed a commit to xbasel/valkey that referenced this pull request Mar 27, 2025
**Background**

- Currently, we implement bitcount using a lookup table method
- By SIMD, parallel table lookups can be achieved, which boosts
performance
- Most x86 servers support the AVX2 instruction set

**BenchMark**
| Value Size | QPS (After optimization) | QPS (Before optimization) |
change |
| ---- | ---- | ---- | ---- |
|16 B | 114925| 115924 |  -0.8%|
|256 B| 112619 | 112201|  +0.3%|
|4 KB| 105523|96251| +9.6%|
|64 KB|79723|36796| +116%|
|1MB|21306|3466|+514%|

CPU: AMD EPYC 9754 128-Core Processor * 8
OS:   Ubuntu Server 22.04 LTS 64bit
Memory: 16GB
VM: Tencent cloud SA5.2XLARGE16

**Test Plan**
Pending. Will add test if it looks okay

**Other**
This PR is based on
https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp

---------

Signed-off-by: chzhoo <[email protected]>
xbasel pushed a commit to xbasel/valkey that referenced this pull request Mar 27, 2025
The unit test was added in valkey-io#1741. It fails when compiled with UBSAN.
Using a local array like `char buf[size]` is undefined behaviour when
`size == 0`. This fix just makes it size 1 in this case.

The failure I got locally:

unit/test_bitops.c:28:13: runtime error: variable length array bound
evaluates to non-positive value 0

Signed-off-by: Viktor Söderqvist <[email protected]>
xbasel pushed a commit to xbasel/valkey that referenced this pull request Mar 27, 2025
In a recent PR valkey-io#1741 the new
header `<immintrin.h>` was added, which transitively includes
`<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes
a call to the `malloc` function.

The Valkey server code explicitly sets the malloc function as a
deprecated function in `server.h`:
```c
void *malloc(size_t size) __attribute__((deprecated));
```

The Valkey server code is then compiled with
`-Werror=deprecated-declarations` option to detect the uses of
deprecated functions like `malloc`, and due to this, when the `bitops.c`
file is compiled with Clang, fails with the following error:

```
In file included from bitops.c:33:
In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26:
In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31:
/usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations]
   35 |     return malloc(__size);
      |            ^
./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here
 3874 | void *malloc(size_t size) __attribute__((deprecated));
```

There is a difference in behavior though, between GCC and Clang. The
`bitops.c` file compiles successfully with GCC.

I don't know exactly why GCC does not issue a warning in this case. My
best guess is that GCC does not issue warnings from code of the standard
library.

To fix the build error in clang, we explicitly use `pragma` macro to
tell clang to ignore deprecated declarations warnings in `bitops.c`.

---------

Signed-off-by: Ricardo Dias <[email protected]>
Signed-off-by: Ricardo Dias <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
zarkash-aws pushed a commit to zarkash-aws/valkey that referenced this pull request Apr 6, 2025
In a recent PR valkey-io#1741 the new
header `<immintrin.h>` was added, which transitively includes
`<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes
a call to the `malloc` function.

The Valkey server code explicitly sets the malloc function as a
deprecated function in `server.h`:
```c
void *malloc(size_t size) __attribute__((deprecated));
```

The Valkey server code is then compiled with
`-Werror=deprecated-declarations` option to detect the uses of
deprecated functions like `malloc`, and due to this, when the `bitops.c`
file is compiled with Clang, fails with the following error:

```
In file included from bitops.c:33:
In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26:
In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31:
/usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations]
   35 |     return malloc(__size);
      |            ^
./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here
 3874 | void *malloc(size_t size) __attribute__((deprecated));
```

There is a difference in behavior though, between GCC and Clang. The
`bitops.c` file compiles successfully with GCC.

I don't know exactly why GCC does not issue a warning in this case. My
best guess is that GCC does not issue warnings from code of the standard
library.

To fix the build error in clang, we explicitly use `pragma` macro to
tell clang to ignore deprecated declarations warnings in `bitops.c`.

---------

Signed-off-by: Ricardo Dias <[email protected]>
Signed-off-by: Ricardo Dias <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Signed-off-by: Shai Zarka <[email protected]>
@madolson madolson moved this from Candidate for release to Done in Valkey 8.1 Apr 7, 2025
murphyjacob4 pushed a commit to enjoy-binbin/valkey that referenced this pull request Apr 13, 2025
**Background**

- Currently, we implement bitcount using a lookup table method
- By SIMD, parallel table lookups can be achieved, which boosts
performance
- Most x86 servers support the AVX2 instruction set

**BenchMark**
| Value Size | QPS (After optimization) | QPS (Before optimization) |
change |
| ---- | ---- | ---- | ---- |
|16 B | 114925| 115924 |  -0.8%|
|256 B| 112619 | 112201|  +0.3%|
|4 KB| 105523|96251| +9.6%|
|64 KB|79723|36796| +116%|
|1MB|21306|3466|+514%|

CPU: AMD EPYC 9754 128-Core Processor * 8
OS:   Ubuntu Server 22.04 LTS 64bit
Memory: 16GB
VM: Tencent cloud SA5.2XLARGE16

**Test Plan**
Pending. Will add test if it looks okay

**Other**
This PR is based on
https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp

---------

Signed-off-by: chzhoo <[email protected]>
murphyjacob4 pushed a commit to enjoy-binbin/valkey that referenced this pull request Apr 13, 2025
The unit test was added in valkey-io#1741. It fails when compiled with UBSAN.
Using a local array like `char buf[size]` is undefined behaviour when
`size == 0`. This fix just makes it size 1 in this case.

The failure I got locally:

unit/test_bitops.c:28:13: runtime error: variable length array bound
evaluates to non-positive value 0

Signed-off-by: Viktor Söderqvist <[email protected]>
murphyjacob4 pushed a commit to enjoy-binbin/valkey that referenced this pull request Apr 13, 2025
In a recent PR valkey-io#1741 the new
header `<immintrin.h>` was added, which transitively includes
`<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes
a call to the `malloc` function.

The Valkey server code explicitly sets the malloc function as a
deprecated function in `server.h`:
```c
void *malloc(size_t size) __attribute__((deprecated));
```

The Valkey server code is then compiled with
`-Werror=deprecated-declarations` option to detect the uses of
deprecated functions like `malloc`, and due to this, when the `bitops.c`
file is compiled with Clang, fails with the following error:

```
In file included from bitops.c:33:
In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26:
In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31:
/usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations]
   35 |     return malloc(__size);
      |            ^
./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here
 3874 | void *malloc(size_t size) __attribute__((deprecated));
```

There is a difference in behavior though, between GCC and Clang. The
`bitops.c` file compiles successfully with GCC.

I don't know exactly why GCC does not issue a warning in this case. My
best guess is that GCC does not issue warnings from code of the standard
library.

To fix the build error in clang, we explicitly use `pragma` macro to
tell clang to ignore deprecated declarations warnings in `bitops.c`.

---------

Signed-off-by: Ricardo Dias <[email protected]>
Signed-off-by: Ricardo Dias <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance release-notes This issue should get a line item in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants