-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Listpack encoding for sets #11290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Listpack encoding for sets #11290
Conversation
when checking value size against set_max_listpack_value Co-authored-by: sundb <[email protected]>
|
missing Line 100 in 1de675b
|
Done. |
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zuiderkwast thank you for taking the time.
here's my initial review (didn't look at the tests yet).
there are a bunch of minor comments here, but also some critical ones about the efficiency (specifically of random picks in listpack, and about popping them out).
p.s. eventually, this PR top comment should include some benchmark that mentions both memory savings and also throughput / latency costs.
* Don't wrap the not-very-long line * Update RDB_VERSION * Change set-max-listpack-entries to 512
In redis#11290, we added listpack encoding for SET object, but forgot to support it in zuiFind. Causing for example ZDIFF command to act on the SET object, it will crash. This PR add support SET listpack in zuiFind, and add tests for related commands to cover this case. Fixes redis#11578
…11581) In #11290, we added listpack encoding for SET object. But forgot to support it in zuiFind, causes ZINTER, ZINTERSTORE, ZINTERCARD, ZIDFF, ZDIFFSTORE to crash. And forgot to support it in RM_ScanKey, causes it hang. This PR add support SET listpack in zuiFind, and in RM_ScanKey. And add tests for related commands to cover this case. Other changes: - There is no reason for zuiFind to go into the internals of the SET. It can simply use setTypeIsMember and don't care about encoding. - Remove the `#include "intset.h"` from server.h reduce the chance of accidental intset API use. - Move setTypeAddAux, setTypeRemoveAux and setTypeIsMemberAux interfaces to the header. - In scanGenericCommand, use setTypeInitIterator and setTypeNext to handle OBJ_SET scan. - In RM_ScanKey, improve hash scan mode, use lpGetValue like zset, they can share code and better performance. The zuiFind part fixes #11578 Co-authored-by: Oran Agra <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
PR #11290 added listpack encoding for sets, but was missing two things: 1. Correct handling of MEMORY USAGE (leading to an assertion). 2. Had an uncontrolled scratch buffer size in SRANDMEMBER leading to OOM panic (reported in #11668). Fixed by copying logic from ZRANDMEMBER. note that both issues didn't exist in any redis release.
Small sets with not only integer elements are listpack encoded, by default
up to 128 elements, max 64 bytes per element, new config `set-max-listpack-entries`
and `set-max-listpack-value`. This saves memory for small sets compared to using a hashtable.
Sets with only integers, even very small sets, are still intset encoded (up to 1G
limit, etc.). Larger sets are hashtable encoded.
This PR increments the RDB version, and has an effect on OBJECT ENCODING
Possible conversions when elements are added:
intset -> listpack
listpack -> hashtable
intset -> hashtable
Note: No conversion happens when elements are deleted. If all elements are
deleted and then added again, the set is deleted and recreated, thus implicitly
converted to a smaller encoding.
Add missing lpFree, introduced in redis#11290
redis#11519) The following example will create an empty set (listpack encoding): ``` > RESTORE key 0 "\x14\x25\x25\x00\x00\x00\x00\x00\x02\x01\x82\x5F\x37\x03\x06\x01\x82\x5F\x35\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x31\x03\x82\x5F\x39\x03\x04\xA9\x08\x01\xFF\x0B\x00\xA3\x26\x49\xB4\x86\xB0\x0F\x41" OK > SCARD key (integer) 0 > SRANDMEMBER key Error: Server closed the connection ``` In the spirit of redis#9297, skip empty set when loading RDB_TYPE_SET_LISTPACK. Introduced in redis#11290
This payload produces a set with duplicate elements (listpack encoding): ``` restore _key 0 "\x14\x25\x25\x00\x00\x00\x0A\x00\x06\x01\x82\x5F\x35\x03\x04\x01\x82\x5F\x31\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x39\x03\x82\x5F\x33\x03\x08\x01\x02\x01\xFF\x0B\x00\x31\xBE\x7D\x41\x01\x03\x5B\xEC" smembers key 1) "6" 2) "_5" 3) "4" 4) "_1" 5) "_3" ---> dup 6) "0" 7) "_9" 8) "_3" ---> dup 9) "8" 10) "2" ``` This kind of sets will cause SDIFF to hang, SDIFF generated a broken protocol and left the client hung. (Expected ten elements, but only got nine elements due to the duplication.) If we set `sanitize-dump-payload` to yes, we will be able to find the duplicate elements and report "ERR Bad data format". Discovered and discussed in redis#11290. This PR also improve prints when corrupt-dump-fuzzer hangs, it will print the cmds and the payload, an example like: ``` Testing integration/corrupt-dump-fuzzer [TIMEOUT]: clients state report follows. sock6 => (SPAWNED SERVER) pid:28884 Killing still running Redis server 28884 commands caused test to hang: SDIFF __key payload that caused test to hang: "\x14\balabala" ``` Co-authored-by: Oran Agra <[email protected]>
…edis#11581) In redis#11290, we added listpack encoding for SET object. But forgot to support it in zuiFind, causes ZINTER, ZINTERSTORE, ZINTERCARD, ZIDFF, ZDIFFSTORE to crash. And forgot to support it in RM_ScanKey, causes it hang. This PR add support SET listpack in zuiFind, and in RM_ScanKey. And add tests for related commands to cover this case. Other changes: - There is no reason for zuiFind to go into the internals of the SET. It can simply use setTypeIsMember and don't care about encoding. - Remove the `#include "intset.h"` from server.h reduce the chance of accidental intset API use. - Move setTypeAddAux, setTypeRemoveAux and setTypeIsMemberAux interfaces to the header. - In scanGenericCommand, use setTypeInitIterator and setTypeNext to handle OBJ_SET scan. - In RM_ScanKey, improve hash scan mode, use lpGetValue like zset, they can share code and better performance. The zuiFind part fixes redis#11578 Co-authored-by: Oran Agra <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
In redis#11290, we added listpack encoding to set, and added CASE 2.5 to SRANDMEMBER. This algorithm provides better performance, but doesn't provide random order. The order at which the members are returned is important. We provide this random order in HRANDFIELD and ZRANDMEMBER case 3. So for SRANDMEMBER we need to do the same. Note that this change will degrade CASE 3 performance by a factor of 3. Unrelated changes: remove useless setTypeRandomElements and fix a typo.
Small sets with not only integer elements are listpack encoded, by default
up to 128 elements, max 64 bytes per element, new config `set-max-listpack-entries`
and `set-max-listpack-value`. This saves memory for small sets compared to using a hashtable.
Sets with only integers, even very small sets, are still intset encoded (up to 1G
limit, etc.). Larger sets are hashtable encoded.
This PR increments the RDB version, and has an effect on OBJECT ENCODING
Possible conversions when elements are added:
intset -> listpack
listpack -> hashtable
intset -> hashtable
Note: No conversion happens when elements are deleted. If all elements are
deleted and then added again, the set is deleted and recreated, thus implicitly
converted to a smaller encoding.
Add missing lpFree, introduced in redis#11290
redis#11519) The following example will create an empty set (listpack encoding): ``` > RESTORE key 0 "\x14\x25\x25\x00\x00\x00\x00\x00\x02\x01\x82\x5F\x37\x03\x06\x01\x82\x5F\x35\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x31\x03\x82\x5F\x39\x03\x04\xA9\x08\x01\xFF\x0B\x00\xA3\x26\x49\xB4\x86\xB0\x0F\x41" OK > SCARD key (integer) 0 > SRANDMEMBER key Error: Server closed the connection ``` In the spirit of redis#9297, skip empty set when loading RDB_TYPE_SET_LISTPACK. Introduced in redis#11290
This payload produces a set with duplicate elements (listpack encoding): ``` restore _key 0 "\x14\x25\x25\x00\x00\x00\x0A\x00\x06\x01\x82\x5F\x35\x03\x04\x01\x82\x5F\x31\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x39\x03\x82\x5F\x33\x03\x08\x01\x02\x01\xFF\x0B\x00\x31\xBE\x7D\x41\x01\x03\x5B\xEC" smembers key 1) "6" 2) "_5" 3) "4" 4) "_1" 5) "_3" ---> dup 6) "0" 7) "_9" 8) "_3" ---> dup 9) "8" 10) "2" ``` This kind of sets will cause SDIFF to hang, SDIFF generated a broken protocol and left the client hung. (Expected ten elements, but only got nine elements due to the duplication.) If we set `sanitize-dump-payload` to yes, we will be able to find the duplicate elements and report "ERR Bad data format". Discovered and discussed in redis#11290. This PR also improve prints when corrupt-dump-fuzzer hangs, it will print the cmds and the payload, an example like: ``` Testing integration/corrupt-dump-fuzzer [TIMEOUT]: clients state report follows. sock6 => (SPAWNED SERVER) pid:28884 Killing still running Redis server 28884 commands caused test to hang: SDIFF __key payload that caused test to hang: "\x14\balabala" ``` Co-authored-by: Oran Agra <[email protected]>
…edis#11581) In redis#11290, we added listpack encoding for SET object. But forgot to support it in zuiFind, causes ZINTER, ZINTERSTORE, ZINTERCARD, ZIDFF, ZDIFFSTORE to crash. And forgot to support it in RM_ScanKey, causes it hang. This PR add support SET listpack in zuiFind, and in RM_ScanKey. And add tests for related commands to cover this case. Other changes: - There is no reason for zuiFind to go into the internals of the SET. It can simply use setTypeIsMember and don't care about encoding. - Remove the `#include "intset.h"` from server.h reduce the chance of accidental intset API use. - Move setTypeAddAux, setTypeRemoveAux and setTypeIsMemberAux interfaces to the header. - In scanGenericCommand, use setTypeInitIterator and setTypeNext to handle OBJ_SET scan. - In RM_ScanKey, improve hash scan mode, use lpGetValue like zset, they can share code and better performance. The zuiFind part fixes redis#11578 Co-authored-by: Oran Agra <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
PR redis#11290 added listpack encoding for sets, but was missing two things: 1. Correct handling of MEMORY USAGE (leading to an assertion). 2. Had an uncontrolled scratch buffer size in SRANDMEMBER leading to OOM panic (reported in redis#11668). Fixed by copying logic from ZRANDMEMBER. note that both issues didn't exist in any redis release.
It seems to be a leak caused by code refactoring in redis#11290.
It seems to be a leak caused by code refactoring in #11290. it's a small leak, that only happens if there's an IO error.
It seems to be a leak caused by code refactoring in redis#11290. it's a small leak, that only happens if there's an IO error.
Small sets with not only integer elements are listpack encoded, by default up to 128 elements, max 64 bytes per element, new config
set-max-listpack-entriesandset-max-listpack-value. This saves memory for small sets compared to using a hashtable.Sets with only integers, even very small sets, are still intset encoded (up to 1G limit, etc.). Larger sets are hashtable encoded.
Possible conversions when elements are added:
Note: No conversion happens when elements are deleted. If all elements are deleted and then added again, the set is deleted and recreated, thus implicitly converted to a smaller encoding.
This PR increments the RDB version, and has an effect on OBJECT ENCODING
Fixes #11155.
Benchmark
redis-serverstarted with default config on my laptop.redis-benchmark -P 10 -q --threads 2 -n 10000000 -r 200000 COMMAND ARGS...wasused for benchmarking, where command and args are given as in the table below.
Throughput (requests per second)
sadd __rand_int__ __rand_int__smembers __rand_int__sismember __rand_int__ __rand_int__spop __rand_int__Memory usage
Memory usage after the 10M
SADDcommands, resulting in 200k keys with ~50 setelements per key, checked using
./redis-cli info memory | grep ^used_memory_humanshows memory usage down by 70%.Remarks
The keys and the set elements given as
__rand_int__which results in stringson the form "000000129245". Since they have leading zeroes, these strings are
not integer-encoded in a listpack. With numeric values without leading zeroes,
you can expect even greater memory savings. Huge memory saving is expected,
since listpack is a compact representation designed to save memory.
The huge speed improvement for
SMEMBERSis expected, since a listpack is asingle allocation and traversing it means very few cache misses. Traversing the
keys in a hash table OTOH means several memory lookups.
The degradation for
SISMEMBERcan be expected, since a hashtable-lookup isnaturally faster than a linear search through a listpack of ~50 elements.