Skip to content

Conversation

@zuiderkwast
Copy link
Contributor

@zuiderkwast zuiderkwast commented Sep 20, 2022

Small sets with not only integer elements are listpack encoded, by default up to 128 elements, max 64 bytes per element, new config set-max-listpack-entries and set-max-listpack-value. This saves memory for small sets compared to using a hashtable.

Sets with only integers, even very small sets, are still intset encoded (up to 1G limit, etc.). Larger sets are hashtable encoded.

Possible conversions when elements are added:

intset -> listpack
listpack -> hashtable
intset -> hashtable

Note: No conversion happens when elements are deleted. If all elements are deleted and then added again, the set is deleted and recreated, thus implicitly converted to a smaller encoding.

This PR increments the RDB version, and has an effect on OBJECT ENCODING

Fixes #11155.

Benchmark

redis-server started with default config on my laptop.

redis-benchmark -P 10 -q --threads 2 -n 10000000 -r 200000 COMMAND ARGS... was
used for benchmarking, where command and args are given as in the table below.

Throughput (requests per second)

Command Unstable This PR Difference
sadd __rand_int__ __rand_int__ 579407.81 655479.81 +13%
smembers __rand_int__ 83266.02 220735.92 +165%
sismember __rand_int__ __rand_int__ 676956.38 570580.88 -16%
spop __rand_int__ 412014.34 563031.38 +37%

Memory usage

Memory usage after the 10M SADD commands, resulting in 200k keys with ~50 set
elements per key, checked using ./redis-cli info memory | grep ^used_memory_human shows memory usage down by 70%.

Command Unstable This PR Difference
Memory usage 200k sets with ~50 elements 539.98M 160.19M -70%

Remarks

The keys and the set elements given as __rand_int__ which results in strings
on the form "000000129245". Since they have leading zeroes, these strings are
not integer-encoded in a listpack. With numeric values without leading zeroes,
you can expect even greater memory savings. Huge memory saving is expected,
since listpack is a compact representation designed to save memory.

The huge speed improvement for SMEMBERS is expected, since a listpack is a
single allocation and traversing it means very few cache misses. Traversing the
keys in a hash table OTOH means several memory lookups.

The degradation for SISMEMBER can be expected, since a hashtable-lookup is
naturally faster than a linear search through a listpack of ~50 elements.

@zuiderkwast zuiderkwast requested a review from sundb September 20, 2022 13:33
@zuiderkwast zuiderkwast marked this pull request as ready for review September 21, 2022 15:03
@sundb
Copy link
Collaborator

sundb commented Sep 26, 2022

missing set-listpack in

"quicklist-v2"

@zuiderkwast
Copy link
Contributor Author

missing set-listpack in redis-check-rdb.c

Done.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuiderkwast thank you for taking the time.
here's my initial review (didn't look at the tests yet).

there are a bunch of minor comments here, but also some critical ones about the efficiency (specifically of random picks in listpack, and about popping them out).

p.s. eventually, this PR top comment should include some benchmark that mentions both memory savings and also throughput / latency costs.

* Don't wrap the not-very-long line
* Update RDB_VERSION
* Change set-max-listpack-entries to 512
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Dec 5, 2022
In redis#11290, we added listpack encoding for SET object,
but forgot to support it in zuiFind. Causing for example
ZDIFF command to act on the SET object, it will crash.

This PR add support SET listpack in zuiFind, and add tests
for related commands to cover this case.

Fixes redis#11578
oranagra added a commit that referenced this pull request Dec 9, 2022
…11581)

In #11290, we added listpack encoding for SET object.
But forgot to support it in zuiFind, causes ZINTER, ZINTERSTORE,
ZINTERCARD, ZIDFF, ZDIFFSTORE to crash.
And forgot to support it in RM_ScanKey, causes it hang.

This PR add support SET listpack in zuiFind, and in RM_ScanKey.
And add tests for related commands to cover this case.

Other changes:
- There is no reason for zuiFind to go into the internals of the SET.
  It can simply use setTypeIsMember and don't care about encoding.
- Remove the `#include "intset.h"` from server.h reduce the chance of
  accidental intset API use.
- Move setTypeAddAux, setTypeRemoveAux and setTypeIsMemberAux
  interfaces to the header.
- In scanGenericCommand, use setTypeInitIterator and setTypeNext
  to handle OBJ_SET scan.
- In RM_ScanKey, improve hash scan mode, use lpGetValue like zset,
  they can share code and better performance.

The zuiFind part fixes #11578

Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
oranagra added a commit that referenced this pull request Jan 5, 2023
PR #11290 added listpack encoding for sets, but was missing two things:
1. Correct handling of MEMORY USAGE (leading to an assertion).
2. Had an uncontrolled scratch buffer size in SRANDMEMBER leading to
   OOM panic (reported in #11668). Fixed by copying logic from ZRANDMEMBER.

note that both issues didn't exist in any redis release.
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
Small sets with not only integer elements are listpack encoded, by default
up to 128 elements, max 64 bytes per element, new config `set-max-listpack-entries`
and `set-max-listpack-value`. This saves memory for small sets compared to using a hashtable.

Sets with only integers, even very small sets, are still intset encoded (up to 1G
limit, etc.). Larger sets are hashtable encoded.

This PR increments the RDB version, and has an effect on OBJECT ENCODING

Possible conversions when elements are added:

    intset -> listpack
    listpack -> hashtable
    intset -> hashtable

Note: No conversion happens when elements are deleted. If all elements are
deleted and then added again, the set is deleted and recreated, thus implicitly
converted to a smaller encoding.
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
redis#11519)

The following example will create an empty set (listpack encoding):
```
> RESTORE key 0
"\x14\x25\x25\x00\x00\x00\x00\x00\x02\x01\x82\x5F\x37\x03\x06\x01\x82\x5F\x35\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x31\x03\x82\x5F\x39\x03\x04\xA9\x08\x01\xFF\x0B\x00\xA3\x26\x49\xB4\x86\xB0\x0F\x41"
OK
> SCARD key
(integer) 0
> SRANDMEMBER key
Error: Server closed the connection
```

In the spirit of redis#9297, skip empty set when loading RDB_TYPE_SET_LISTPACK.
Introduced in redis#11290
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
This payload produces a set with duplicate elements (listpack encoding):
```
restore _key 0 "\x14\x25\x25\x00\x00\x00\x0A\x00\x06\x01\x82\x5F\x35\x03\x04\x01\x82\x5F\x31\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x39\x03\x82\x5F\x33\x03\x08\x01\x02\x01\xFF\x0B\x00\x31\xBE\x7D\x41\x01\x03\x5B\xEC"

smembers key
1) "6"
2) "_5"
3) "4"
4) "_1"
5) "_3"  ---> dup
6) "0"
7) "_9"
8) "_3"  ---> dup
9) "8"
10) "2"
```

This kind of sets will cause SDIFF to hang, SDIFF generated a broken
protocol and left the client hung. (Expected ten elements, but only
got nine elements due to the duplication.)

If we set `sanitize-dump-payload` to yes, we will be able to find
the duplicate elements and report "ERR Bad data format".

Discovered and discussed in redis#11290.

This PR also improve prints when corrupt-dump-fuzzer hangs, it will
print the cmds and the payload, an example like:
```
Testing integration/corrupt-dump-fuzzer
[TIMEOUT]: clients state report follows.
sock6 => (SPAWNED SERVER) pid:28884
Killing still running Redis server 28884
commands caused test to hang:
SDIFF __key 
payload that caused test to hang: "\x14\balabala"
```

Co-authored-by: Oran Agra <[email protected]>
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
…edis#11581)

In redis#11290, we added listpack encoding for SET object.
But forgot to support it in zuiFind, causes ZINTER, ZINTERSTORE,
ZINTERCARD, ZIDFF, ZDIFFSTORE to crash.
And forgot to support it in RM_ScanKey, causes it hang.

This PR add support SET listpack in zuiFind, and in RM_ScanKey.
And add tests for related commands to cover this case.

Other changes:
- There is no reason for zuiFind to go into the internals of the SET.
  It can simply use setTypeIsMember and don't care about encoding.
- Remove the `#include "intset.h"` from server.h reduce the chance of
  accidental intset API use.
- Move setTypeAddAux, setTypeRemoveAux and setTypeIsMemberAux
  interfaces to the header.
- In scanGenericCommand, use setTypeInitIterator and setTypeNext
  to handle OBJ_SET scan.
- In RM_ScanKey, improve hash scan mode, use lpGetValue like zset,
  they can share code and better performance.

The zuiFind part fixes redis#11578

Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request May 19, 2023
In redis#11290, we added listpack encoding to set, and added CASE 2.5
to SRANDMEMBER. This algorithm provides better performance, but
doesn't provide random order.

The order at which the members are returned is important. We provide
this random order in HRANDFIELD and ZRANDMEMBER case 3. So for
SRANDMEMBER we need to do the same.

Note that this change will degrade CASE 3 performance by a factor of 3.

Unrelated changes: remove useless setTypeRandomElements and fix a typo.
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
Small sets with not only integer elements are listpack encoded, by default
up to 128 elements, max 64 bytes per element, new config `set-max-listpack-entries`
and `set-max-listpack-value`. This saves memory for small sets compared to using a hashtable.

Sets with only integers, even very small sets, are still intset encoded (up to 1G
limit, etc.). Larger sets are hashtable encoded.

This PR increments the RDB version, and has an effect on OBJECT ENCODING

Possible conversions when elements are added:

    intset -> listpack
    listpack -> hashtable
    intset -> hashtable

Note: No conversion happens when elements are deleted. If all elements are
deleted and then added again, the set is deleted and recreated, thus implicitly
converted to a smaller encoding.
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
redis#11519)

The following example will create an empty set (listpack encoding):
```
> RESTORE key 0
"\x14\x25\x25\x00\x00\x00\x00\x00\x02\x01\x82\x5F\x37\x03\x06\x01\x82\x5F\x35\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x31\x03\x82\x5F\x39\x03\x04\xA9\x08\x01\xFF\x0B\x00\xA3\x26\x49\xB4\x86\xB0\x0F\x41"
OK
> SCARD key
(integer) 0
> SRANDMEMBER key
Error: Server closed the connection
```

In the spirit of redis#9297, skip empty set when loading RDB_TYPE_SET_LISTPACK.
Introduced in redis#11290
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
This payload produces a set with duplicate elements (listpack encoding):
```
restore _key 0 "\x14\x25\x25\x00\x00\x00\x0A\x00\x06\x01\x82\x5F\x35\x03\x04\x01\x82\x5F\x31\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x39\x03\x82\x5F\x33\x03\x08\x01\x02\x01\xFF\x0B\x00\x31\xBE\x7D\x41\x01\x03\x5B\xEC"

smembers key
1) "6"
2) "_5"
3) "4"
4) "_1"
5) "_3"  ---> dup
6) "0"
7) "_9"
8) "_3"  ---> dup
9) "8"
10) "2"
```

This kind of sets will cause SDIFF to hang, SDIFF generated a broken
protocol and left the client hung. (Expected ten elements, but only
got nine elements due to the duplication.)

If we set `sanitize-dump-payload` to yes, we will be able to find
the duplicate elements and report "ERR Bad data format".

Discovered and discussed in redis#11290.

This PR also improve prints when corrupt-dump-fuzzer hangs, it will
print the cmds and the payload, an example like:
```
Testing integration/corrupt-dump-fuzzer
[TIMEOUT]: clients state report follows.
sock6 => (SPAWNED SERVER) pid:28884
Killing still running Redis server 28884
commands caused test to hang:
SDIFF __key 
payload that caused test to hang: "\x14\balabala"
```

Co-authored-by: Oran Agra <[email protected]>
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
…edis#11581)

In redis#11290, we added listpack encoding for SET object.
But forgot to support it in zuiFind, causes ZINTER, ZINTERSTORE,
ZINTERCARD, ZIDFF, ZDIFFSTORE to crash.
And forgot to support it in RM_ScanKey, causes it hang.

This PR add support SET listpack in zuiFind, and in RM_ScanKey.
And add tests for related commands to cover this case.

Other changes:
- There is no reason for zuiFind to go into the internals of the SET.
  It can simply use setTypeIsMember and don't care about encoding.
- Remove the `#include "intset.h"` from server.h reduce the chance of
  accidental intset API use.
- Move setTypeAddAux, setTypeRemoveAux and setTypeIsMemberAux
  interfaces to the header.
- In scanGenericCommand, use setTypeInitIterator and setTypeNext
  to handle OBJ_SET scan.
- In RM_ScanKey, improve hash scan mode, use lpGetValue like zset,
  they can share code and better performance.

The zuiFind part fixes redis#11578

Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
PR redis#11290 added listpack encoding for sets, but was missing two things:
1. Correct handling of MEMORY USAGE (leading to an assertion).
2. Had an uncontrolled scratch buffer size in SRANDMEMBER leading to
   OOM panic (reported in redis#11668). Fixed by copying logic from ZRANDMEMBER.

note that both issues didn't exist in any redis release.
@sundb sundb mentioned this pull request Aug 18, 2023
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Feb 22, 2024
It seems to be a leak caused by code refactoring in redis#11290.
oranagra pushed a commit that referenced this pull request Feb 22, 2024
It seems to be a leak caused by code refactoring in #11290.
it's a small leak, that only happens if there's an IO error.
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
It seems to be a leak caused by code refactoring in redis#11290.
it's a small leak, that only happens if there's an IO error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Improved memory efficiency for small sets

5 participants