Skip to content

Conversation

@yairgott
Copy link
Contributor

@yairgott yairgott commented Mar 1, 2025

Overview

Sharing memory between the module and engine reduces memory overhead by eliminating redundant copies of stored records in the module. This is particularly beneficial for search workloads that require indexing large volumes of documents.

Vectors

Vector similarity search requires storing large volumes of high-cardinality vectors. For example, a single vector with 512 dimensions consumes 2048 bytes, and typical workloads often involve millions of vectors. Due to the lack of a memory-sharing mechanism between the module and the engine, ValkeySearch currently doubles memory consumption when indexing vectors, significantly increasing operational costs. This limitation introduces adoption friction and reduces ValkeySearch's competitiveness.

Implementation Details

Memory Allocation Strategy

At a fundamental level, there are two primary allocation strategies:

  • [Chosen] Module-allocated memory shared with the engine.
  • Engine-allocated memory shared with the module.

For ValkeySearch, it is crucial that vectors reside in cache-aligned memory to maximize SIMD optimizations. Allowing the module to allocate memory provides greater flexibility for different use cases, though it introduces slightly higher implementation complexity.

Shared SDS

Shared SDS, a new data type, facilitates module-engine memory sharing with thread-safe intrusive reference counting. It preserves SDS semantics and structure while adding ref-counting and a free callback for statistics tracking.

A core component that enables thread-safe buffer sharing could be beneficial for use cases beyond modules. One notable advantage is avoiding deep copies of buffers when IO threading is enabled.

Module API

New Module APIs

  • VM_CreateSharedSDS:
    • Creates a new Shared SDS.
    • Accepts an allocation function for fine-grained control (e.g., cache alignment).
    • Accepts a free callback function to track deallocations.
  • VM_SharedSDSPtrLen: Retrieves the raw buffer pointer and length of a Shared SDS.
  • VM_ReleaseSharedSDS: Decreases the Shared SDS ref-count by 1.

Extended Module APIs

  • VM_HashSet: Supports setting a shared SDS in the hash.
  • VM_HashGet: Retrieves a shared SDS from the hash and increments its ref-count by 1.

Engine Hash Data-Type

ValkeySearch indexes documents which reside in engine as t_hash data-type records. While JSON is also supported, it is out of scope for this discussion. The t_hash implementation is based on either list-pack for small datasets or hashtable for larger ones.

Since list-pack performs deep copies, it cannot support intrusive ref-counting semantics. As a result, if list-pack is used as the underline data-type while setting a shared SDS, .e.g. by calling VM_HashSet, it is converted to hashtable. Additionally, for the same reason, a shared SDS is never stored as embedded value in a hashtable entry.

@yairgott
Copy link
Contributor Author

yairgott commented Mar 1, 2025

Planning to add unit tests soon. Publishing early to start the discussion rolling.

Sharing memory between the module and engine reduces memory overhead by eliminating
redundant copies of stored entries in the module. This is particularly beneficial
for search workloads that require indexing large volumes of stored data.

Shared SDS, a new data type, facilitates module-engine memory sharing with thread-safe
intrusive reference counting. It preserves SDS semantics and structure while adding
ref-counting and a free callback for statistics tracking.

New module APIs:

- VM_CreateSharedSDS: Creates a new Shared SDS.
- VM_SharedSDSPtrLen: Retrieves the raw buffer pointer and length of a Shared SDS.
- VM_ReleaseSharedSDS: Decreases the Shared SDS ref-count by 1.

Extended module APIs:

- VM_HashSet: Now supports setting a shared SDS in the hash.
- VM_HashGet: Retrieves a shared SDS and increments its ref-count by 1.
@yairgott yairgott force-pushed the engine_module_shared_memory branch from 104a4dd to f9aad1a Compare March 1, 2025 09:24
@codecov
Copy link

codecov bot commented Mar 1, 2025

Codecov Report

Attention: Patch coverage is 22.11538% with 81 lines in your changes missing coverage. Please review.

Project coverage is 70.93%. Comparing base (3f6581b) to head (47a9487).
Report is 176 commits behind head on unstable.

Files with missing lines Patch % Lines
src/module.c 6.52% 43 Missing ⚠️
src/sds.c 25.64% 29 Missing ⚠️
src/sds.h 0.00% 5 Missing ⚠️
src/t_hash.c 69.23% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1804      +/-   ##
============================================
- Coverage     70.99%   70.93%   -0.06%     
============================================
  Files           123      123              
  Lines         65651    65749      +98     
============================================
+ Hits          46609    46642      +33     
- Misses        19042    19107      +65     
Files with missing lines Coverage Δ
src/server.h 100.00% <ø> (ø)
src/t_zset.c 96.88% <100.00%> (+0.04%) ⬆️
src/t_hash.c 95.71% <69.23%> (-0.52%) ⬇️
src/sds.h 78.04% <0.00%> (-4.85%) ⬇️
src/sds.c 82.69% <25.64%> (-3.98%) ⬇️
src/module.c 9.60% <6.52%> (-0.01%) ⬇️

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@madolson
Copy link
Member

madolson commented Mar 3, 2025

Consensus is that we don't want to rush the implementation and commit to a specific API, after we are technically past the new API cutoff for the release. If we can converge offline about the design quickly, we'll merge it, otherwise we'll wait until 9.0.

@PingXie PingXie requested review from ranshid and zuiderkwast March 3, 2025 19:54
@yairgott yairgott closed this Mar 3, 2025
@yairgott
Copy link
Contributor Author

yairgott commented Mar 3, 2025

@zuiderkwast , @ranshid , happy discuss further with you!

It would be valuable for ValkeySearch 1.0 to have such interface in 8.1, after all there is no second chance to make a first time good impression ;).

@yairgott yairgott reopened this Mar 3, 2025
@yairgott yairgott force-pushed the engine_module_shared_memory branch 6 times, most recently from dd9a9d2 to d601ba1 Compare March 5, 2025 08:13
Signed-off-by: yairgott <[email protected]>
@yairgott yairgott force-pushed the engine_module_shared_memory branch from d601ba1 to 47a9487 Compare March 5, 2025 19:51
Copy link
Member

@ranshid ranshid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial highlevel comments.

I think the main issue I have is that this new API implementation has some unclear limitations.
For example it can only support up sds32 headers (which is probably 99.99% O.K, but still...) and it will not be O.K to use in all cases were we might have to embed the field (eg keys, hash fields etc...)

I think all embedding flows will go though sdswrite, but we need a way to make sure we assert in all flows that might embed the sds.
Also we need to make sure we document this correctly for module users to understand these limitations as they are not trivial and can easily be changed following future core changes that might embed some fields for optimizations.

return sh->alloc - sh->len;
}
case SDS_TYPE_32_SHARED: {
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indicating that there is zero available space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an avail field in the struct, so technically there could be unused space in the allocation just like for the other sds types.

If you don't need this, why do you store an avail field in the struct? We can remove the field and save 4 bytes.

size_t value_size = sdsReqSize(value_len, SDS_TYPE_8);
sds embedded_field_sds;
if (field_size + value_size <= EMBED_VALUE_MAX_ALLOC_SIZE) {
if (sdsType(value) != SDS_TYPE_32_SHARED && field_size + value_size <= EMBED_VALUE_MAX_ALLOC_SIZE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
if (sdsType(value) != SDS_TYPE_32_SHARED && field_size + value_size <= EMBED_VALUE_MAX_ALLOC_SIZE) {
bool embed_value = sdsType(value) != SDS_TYPE_32_SHARED && field_size + value_size <= EMBED_VALUE_MAX_ALLOC_SIZE;
if (embed_value) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also theoretically field can also be a shared sds right? I know in your VSS implementation only values are, but in case we would like to generalize this new API we should at-least assert in case the field is shared.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the caller provided a very short shared sds, let's say 5 bytes, why do we need to keep it? Embedding a copy may use less space than storing a pointer to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also theoretically field can also be a shared sds right? I know in your VSS implementation only values are, but in case we would like to generalize this new API we should at-least assert in case the field is shared.

In this case, it should still be fine to embed a copy of the field's content, whether it's a shared sds or not, right?

* - `sh`: A pointer to the `sdshdrshared` structure whose reference count
* should be increased.
*/
void sdsRetain(sdshdrshared *sh) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function accepting sdshdrshared meant in order to enforce type correctness by the compiler?
I wonder if it would not be better to allow this function to accept any sds type and just return in case the sds type is not shread.
this way we can avoid external applications errors by force casting wrong sds type and also using the SDS_HDR_VAR macro which AFAIK is mainly used internally in the sds implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I might not be getting to the bottom of what you're saying but we should definitely leverage the compiler to enforce types. Noting that one may cast non-sds to a shared-sds which would also break.

Copy link
Contributor

@zuiderkwast zuiderkwast May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should expose the type sdshdrshared and require caller to use the SDS_HDR_VAR macro to find this struct. IMO, this stuff should have been internal to sds.c and never been part of the public API.

I see in t_hash.c there is code like

        if (sdsType(value) == SDS_TYPE_32_SHARED) {
            SDS_HDR_VAR(32shared, value);
            sdsRetain(sh);
        }

I prefer that we make this void sdsRetainShared(sds *s); and internally it assert that it is shared.

#define SDS_TYPE_16 2
#define SDS_TYPE_32 3
#define SDS_TYPE_64 4
#define SDS_TYPE_32_SHARED 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we use shared only up to 32 size? we could generalize better by supporting SDS_TYPE_64 and avoid this limitation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 is enough for VSS. Note that in some cases, the vector cardinality is low and having this additional 4 bytes is not optimal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not want to add this complex feature only for VSS.

If you don't need the alloc field, we could remove it and make the len field 64 bytes. It makes the type usable for all sizes, including future use cases.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review. Just a few thoughts.

  • I acknowledge the need to share strings between module and core.
  • I think this shared string type should be called ValkeyModuleSharedString. There is nothing SDS-specific in the API so the term SDS has no meaning in the module perspective. SDS is an internal implementation detail.
  • This new type increases the complexity of the module API. We will have one more string type with its own set of functions. I imagine we want to allow shared strings for more things in the future, such as set elements, string values, sorted set elements, etc. and conversion between this and other kinds of strings, so the size and complexity of the API might grow significantly.
  • Internally in the core, we have been moving towards embedding SDS strings into other structures. Shared strings can't be embedded and this adds new complexity in the core.
  • Accessing strings without copying them has been discussed before, but more often about modules accessing values from the core. I imagine we can allow some read-only access as const char * in a way similar to ValkeyModule_StringDMA but for hash values, set elements, etc.

@yairgott
Copy link
Contributor Author

yairgott commented Mar 25, 2025

Some initial highlevel comments.

I think the main issue I have is that this new API implementation has some unclear limitations. For example it can only support up sds32 headers (which is probably 99.99% O.K, but still...)

Just to clarify, do you see any issues with extending it by introducing new types of shared SDS, such as sds16, etc.?

and it will not be O.K to use in all cases were we might have to embed the field (eg keys, hash fields etc...)

Let's discuss separately:

  1. Embedding a shared SDS within an SDS – Do you see any issues with the current PR implementation?
  2. Embedding an SDS within a shared SDS – Do you see any conceptual issues with supporting this?

I think all embedding flows will go though sdswrite, but we need a way to make sure we assert in all flows that might embed the sds. Also we need to make sure we document this correctly for module users to understand these limitations as they are not trivial and can easily be changed following future core changes that might embed some fields for optimizations.

@yairgott
Copy link
Contributor Author

yairgott commented Mar 25, 2025

Not a full review. Just a few thoughts.

  • I acknowledge the need to share strings between module and core.
  • I think this shared string type should be called ValkeyModuleSharedString. There is nothing SDS-specific in the API so the term SDS has no meaning in the module perspective. SDS is an internal implementation detail.

Ack.

  • This new type increases the complexity of the module API. We will have one more string type with its own set of functions. I imagine we want to allow shared strings for more things in the future, such as set elements, string values, sorted set elements, etc. and conversion between this and other kinds of strings, so the size and complexity of the API might grow significantly.

I agree. We should be very cognitive in which situations it really provides value.

  • Internally in the core, we have been moving towards embedding SDS strings into other structures. Shared strings can't be embedded and this adds new complexity in the core.

Let's discuss separately:

Embedding a shared SDS within an SDS – Do you see any issues with the current PR implementation?
Embedding an SDS within a shared SDS – Do you see any conceptual issues with supporting this?

  • Accessing strings without copying them has been discussed before, but more often about modules accessing values from the core. I imagine we can allow some read-only access as const char * in a way similar to ValkeyModule_StringDMA but for hash values, set elements, etc.

Hmm, can you clarify this? The reason for introducing a shared sds is to avoid breaking down the hash interface such that the value remains as a sds?

@yairgott
Copy link
Contributor Author

yairgott commented Mar 26, 2025

  • I think this shared string type should be called ValkeyModuleSharedString. There is nothing SDS-specific in the API so the term SDS has no meaning in the module perspective. SDS is an internal implementation detail.

Noting that ValkeyModuleString is already reference-counted (but not thread-safe) which might be confusing. It is implemented as a special robj with OBJ_STRING as its type. ValkeyModuleSharedString could potentially be a new robj type. Keeping in mind that the hash value would still need to reference a SDS object, which in turn must maintain a reference to the robj where reference counting is managed. This could be done by using one of the unused bits to indicate that the reference to the robj is located just before the SDS.

The benefit of this approach is that it avoids the need to introduce a new SDS type, such as sharedsds32.

Cons of this approach:

  1. Requires using one of the remaining unused bits (availability unknown).
  2. While not a blocker, robj is not thread-safe. To address this, the module must ensure that retaining and releasing ValkeyModuleSharedString objects are handled exclusively by the main thread.

An alternative could be renaming the type to something like ValkeyModuleThreadSafeString.

wdyt?

@madolson madolson moved this to Todo in Valkey 9.0 Mar 31, 2025
@yairgott
Copy link
Contributor Author

Resurrecting this PR. To avoid missing the 9.0 release window, we need to resume work on it ASAP.

@zuiderkwast, @ranshid, the key question is whether you have any reservations about, or a better alternative to, the shared sds-based approach. Once we align on this, we can move forward with the remaining topics.

FWIW, I’ve responded to your earlier comments, PTAL.

@zuiderkwast
Copy link
Contributor

the key question is whether you have any reservations about, or a better alternative to, the shared sds-based approach

This approach is fine by me. It's probably the best approach out of the alternatives mentioned.

  • I think this shared string type should be called ValkeyModuleSharedString. There is nothing SDS-specific in the API so the term SDS has no meaning in the module perspective. SDS is an internal implementation detail.

Noting that ValkeyModuleString is already reference-counted (but not thread-safe) which might be confusing. It is implemented as a special robj with OBJ_STRING as its type. ValkeyModuleSharedString could potentially be a new robj type. Keeping in mind that the hash value would still need to reference a SDS object, which in turn must maintain a reference to the robj where reference counting is managed. This could be done by using one of the unused bits to indicate that the reference to the robj is located just before the SDS.

The benefit of this approach is that it avoids the need to introduce a new SDS type, such as sharedsds32.

I don't want it to be an robj. A kind of sds is good, but we don't need to use the name "sds" in the module API.

What's wrong with using the name ValkeyModuleSharedString in the module API for an shared sds string? Both types (ValkeyModuleSharedString and ValkeyModuleString) are opaque so the users don't need to know that they are completely different.

An alternative could be renaming the type to something like ValkeyModuleThreadSafeString.

Yes, I'm fine with this name too, but I think "shared string" is better, because the main purpose is to allow it to be shared between core and module without copying. Thread-safe is a bonus. We can mention it in documentation.

s_free_with_size(sdsAllocPtr(s), sdsAllocSize(s));
}
/* return 1 if the shared sds is freed otherwise 0 */
int sdsReleaseShared(sdshdrshared *sh) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to expose this function in the API? The caller can use sdsfree instaed and it does the ref-counter logic for the shared ones. We can slim the API and skip this function.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about the added complexity. This needs a lot of reviewing, looking at the bigger picture and refactoring to avoid adding technical dept. That's a great concern to me. The simplicity is the reason this 15 year old code base is still manageable. I hope we do the right decisions here about the API that we can't change later. The internals can be improved later.

Is there any existing code or function that takes an sds argument will not work if we pass a shared-sds to it? A shared-sds can always be used where an sds is expected?

* returned; otherwise the input string is returned. */
static ValkeyModuleString *value_or_delete(ValkeyModuleString *s) {
if (!strcasecmp(ValkeyModule_StringPtrLen(s, NULL), ":delete:"))
static void *value_or_delete(ValkeyModuleCtx *ctx, ValkeyModuleString *s, int flags) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this returns different types depending on flags... Not the best type safety... At least add a proper comment to explain the return type logic.

Users read these modules source code as examples to learn how to write modules. We should keep that in mind.

Comment on lines +1891 to +1894
VALKEYMODULE_API ValkeyModuleSharedSDS *(*ValkeyModule_CreateSharedSDS)(ValkeyModuleCtx *ctx, size_t len,
ValkeyModuleSharedSDSAllocFunc allocfn,
ValkeyModuleSharedSDSFreeCBFunc freecbfn) VALKEYMODULE_ATTR;
VALKEYMODULE_API char *(*ValkeyModule_SharedSDSPtrLen)(ValkeyModuleSharedSDS *shared_sds, size_t *len) VALKEYMODULE_ATTR;
Copy link
Contributor

@zuiderkwast zuiderkwast May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the alloc and free callbacks? Maybe we can skip them? There are some cons to keep them:

  • Extra complexity
  • Defrag doesn't work
  • The free callback takes 64 bits to store

You mentioned it is for "fine-grained control (e.g., cache alignment)". Can we just make them always cache-line-size aligned?

If we remove the alloc and free callbacks, the strings are allocated using zmalloc and the memory usage is tracked by valkey rather than the module. I think that can be quite good. What are the drawbacks?

(Another thing that annoyed me is presence of "CB" in FreeCBFunc but not in AllocFunc. Both are callbacks...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance reasons, it's important that vector allocations are handled by an allocator that ensures they reside in contiguous or nearby memory addresses, promoting better CPU cache efficiency. Valkey-search currently maintains a fixed size buffer allocator to guarantee this behavior and cache alignment.

About always using cache alignment addresses, I think that it might become an issue if a shared sds is used for other scenarios. Cache alignment has some drawbacks, like:

  1. Increased memory usage: due to padding and internal fragmentation, especially for small allocations.
  2. Allocator overhead
  3. Impact on CPU cache efficiency

Comment on lines +5355 to +5369
char *value_sds;
if (flags & VALKEYMODULE_HASH_SHAREBLE_VALUES) {
if (key->value->encoding == OBJ_ENCODING_LISTPACK) {
/* Convert to hashtable encoding, as list pack encoding performs a deep copy
* of the buffer, breaking ref-counting semantics. */
hashTypeConvert(key->value, OBJ_ENCODING_HASHTABLE);
}
value_sds = ((ValkeyModuleSharedSDS *)value)->buf;
} else {
value_sds = value->ptr;
robj *argv[2] = {field, value};
hashTypeTryConversion(key->value, argv, 0, 1);
}

int updated = hashTypeSet(key->value, field->ptr, value_sds, low_flags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has some technical dept already and here we're adding more.

hashTypeSet already converts to hashtable if needed, so hashTypeTryConversion wasn't needed here before and it isn't needed here how, unless I'm missing something.

I think we should move the handling of SDS_TYPE_32_SHARED to hashTypeSet and convert to hashtable there, and not have this logic here in module.c. This stuff belongs better in the hash type code.

Suggested change
char *value_sds;
if (flags & VALKEYMODULE_HASH_SHAREBLE_VALUES) {
if (key->value->encoding == OBJ_ENCODING_LISTPACK) {
/* Convert to hashtable encoding, as list pack encoding performs a deep copy
* of the buffer, breaking ref-counting semantics. */
hashTypeConvert(key->value, OBJ_ENCODING_HASHTABLE);
}
value_sds = ((ValkeyModuleSharedSDS *)value)->buf;
} else {
value_sds = value->ptr;
robj *argv[2] = {field, value};
hashTypeTryConversion(key->value, argv, 0, 1);
}
int updated = hashTypeSet(key->value, field->ptr, value_sds, low_flags);
sds value_sds;
if (flags & VALKEYMODULE_HASH_SHAREBLE_VALUES) {
value_sds = ((ValkeyModuleSharedSDS *)value)->buf;
} else {
value_sds = value->ptr;
}
int updated = hashTypeSet(key->value, field->ptr, value_sds, low_flags);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense

#define VALKEYMODULE_HASH_CFIELDS (1 << 2)
#define VALKEYMODULE_HASH_EXISTS (1 << 3)
#define VALKEYMODULE_HASH_COUNT_ALL (1 << 4)
#define VALKEYMODULE_HASH_SHAREBLE_VALUES (1 << 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling ==> SHAREABLE

But, depending on what name we use for the shared string type, make this macro match that name, e.g.

ValkeyModuleSharedString <--> VALKEYMODULE_HASH_SHARED_STRINGS


sds v;
if (flags & HASH_SET_TAKE_VALUE) {
if (flags & HASH_SET_TAKE_VALUE || sdsType(value) == SDS_TYPE_32_SHARED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the semantics of TAKE_VALUE even for the shared sds:

  • If HASH_SET_TAKE_VALUE is set, take over the ownership without incrementing the reference counter.
  • Otherwise, increment the reference counter. This corresponds to a copy.

Comment on lines +137 to +140
if (sdsType(value) == SDS_TYPE_32_SHARED) {
SDS_HDR_VAR(32shared, value);
sdsRetain(sh);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has this doc comment:

/* takes ownership of value, does not take ownership of field */

It means that we should not increment the reference counter. We take over the existing reference. The caller needs to increment it if they want to keep a reference to it.

Suggested change
if (sdsType(value) == SDS_TYPE_32_SHARED) {
SDS_HDR_VAR(32shared, value);
sdsRetain(sh);
}

@zuiderkwast
Copy link
Contributor

What happens if a ref-counterd object (robj AKS ValkeyModuleString) contains a pointer to a shared-sds? It means it's a valid ValkeyModuleString right? And it can be duplicated without duplicating the underlying shared-sds?

I'm thinking about future things like the string key API. It may be a useful thing for modules that store very large data in strings. Do we need to duplicate all functions to support the new shared-string type or are there other ways...?

int VM_StringSet(ValkeyModuleKey *key, ValkeyModuleString *str);

This one exists, but I haven't found any VM_StringGet. Doesn't it exist? Do you know how to get a string value? Is it possible to use VM_OpenKey and then access the value from there?

To get and set shared-sds strings for string keys, would we need separate APIs or are there alternatives, like temporarily wrapping a shared-sds inside an robj just to fit the existing APIs. Should we have some wrapping API for that? Such as:

ValkeyModuleString *VM_CreateStringFromSharedString(ValkeyModuleCtx *ctx, ValkeyModuleSharedString *shared);

@yairgott
Copy link
Contributor Author

I have concerns about the added complexity. This needs a lot of reviewing, looking at the bigger picture and refactoring to avoid adding technical dept. That's a great concern to me. The simplicity is the reason this 15 year old code base is still manageable. I hope we do the right decisions here about the API that we can't change later. The internals can be improved later.

Agreed

Is there any existing code or function that takes an sds argument will not work if we pass a shared-sds to it? A shared-sds can always be used where an sds is expected?

Some SDS functions, such as _sdsMakeRoomFor, can't safely operate with shared SDS because they might trigger a reallocation.

Currently, I've added assertions in all these places and I'm planning to document the constraints and limitations.

@zuiderkwast
Copy link
Contributor

Some SDS functions, such as _sdsMakeRoomFor, can't safely operate with shared SDS because they might trigger a reallocation.

Can't we do copy-on-write? If the ref-counter is greater than one, decrement the reference counter, create a new one and return the new one. The other references to the old string remain intact.

@yairgott
Copy link
Contributor Author

yairgott commented May 16, 2025

Some SDS functions, such as _sdsMakeRoomFor, can't safely operate with shared SDS because they might trigger a reallocation.

Can't we do copy-on-write? If the ref-counter is greater than one, decrement the reference counter, create a new one and return the new one. The other references to the old string remain intact.

Yes, it's possible, but doing so could break the caller's expectations and isn't consistent with the function's name, which doesn't imply a creation of an addition instance, more than doubling the memory consumption.

@zuiderkwast
Copy link
Contributor

Can't we do copy-on-write? If the ref-counter is greater than one, decrement the reference counter, create a new one and return the new one. The other references to the old string remain intact.

Yes, it's possible, but doing so could break the caller's expectations and isn't consistent with the function's name, which doesn't imply a creation of an addition instance, more than doubling the memory consumption.

True. But what if the core modifies the string using e.g. HINCRBY (not sure if it does it in-place) or APPEND (for string keys), doesn't this break some expectation by the module? Should the shared-sds be treated as immutable?

Or if the core deletes it using HDEL, how does the module find out? Is the free callback essential for detecting this? If we just decrement the ref-counter, the module will retain the string. This is what I would expect from a ref-counted type, but I'm suspecting this is not exactly what this feature needs. Maybe we need the free-callback but we don't really need the reference counter?

@yairgott
Copy link
Contributor Author

yairgott commented May 21, 2025

This one exists, but I haven't found any VM_StringGet. Doesn't it exist? Do you know how to get a string value? Is it possible to use VM_OpenKey and then access the value from there?

Yeah, to read the key, you’d first need to use VM_OpenKey, followed by the module API VM_StringDMA.

FWIW, I think we should improve the documentation to support deep-linking to individual module APIs. Currently, on the module API page, there's no way to reference a specific API directly for bookmarking or sharing.

To get and set shared-sds strings for string keys, would we need separate APIs or are there alternatives, like temporarily wrapping a shared-sds inside an robj just to fit the existing APIs. Should we have some wrapping API for that?

I imagine the value of supporting shared SDS for string keys is relatively low, since keys are typically short. For string key values, it probably makes sense to use VM_StringDMA.

@yairgott
Copy link
Contributor Author

True. But what if the core modifies the string using e.g. HINCRBY (not sure if it does it in-place) or APPEND (for string keys), doesn't this break some expectation by the module? Should the shared-sds be treated as immutable?

In general, modifying a shared SDS can violate assumptions made by modules. However, I don't think this applies to the valkey-search use case, since vectors are stored as strings in a hash, and modifying a hash field replaces the existing value rather than mutating it in place.

Maybe we need the free-callback but we don't really need the reference counter?

I think a simple free-callback could work even without reference counting. That said, ref counting isn't the main source of complexity, and it does make shared SDS more versatile for supporting other potential use cases.

@zuiderkwast
Copy link
Contributor

Yeah, to read the key, you’d first need to use VM_OpenKey, followed by the module API VM_StringDMA.

Thanks. I've seen this, but assumed there is another API to use when you don't need direct memory access.

FWIW, I think we should improve the documentation to support deep-linking to individual module APIs. Currently, on the module API page, there's no way to reference a specific API directly for bookmarking or sharing.

There are anchor link targets like https://valkey.io/topics/modules-api-ref/#ValkeyModule_StringDMA but they're not displayed next to the function names. That's a website issue. Open an issue or PR in the website repo if you want.

Now, you can find these links under Function index at the bottom of the page.

if the core deletes it using HDEL, how does the module find out? Is the free callback essential for detecting this?

You didn't answer these questions. I'm curious. Do you rely on keyspace notifications for this?

In general, modifying a shared SDS can violate assumptions made by modules. However, I don't think this applies to the valkey-search use case, since vectors are stored as strings in a hash, and modifying a hash field replaces the existing value rather than mutating it in place.

What's worse IMHO is to violate assumptions made by the core. 😆 A new variant of sds should behave as an sds and not break the abstraction and expectations. If the shared sds masquerades as an sds but breaks some important expectations, then it is a complexity problem and a breach of the abstraction. Any function that works with an sds value would need to check if it's mutable or not, if it's a normal sds or a shared one before doing anything, if we go down this path. That's what I'm concerned about here. If it's an sds, then it needs to behave as an sds. Yeah, well, the embedded sds strings have this problem too. They are effectively immutable.

Regarding mutability, if we want to give an impression that it should be treated as immutable, then the proposed module API for creating a shared sds is a bit misleading, because it creates an uninitialized value that needs to be mutated:

ValkeyModuleSharedSDS *VM_CreateSharedSDS(ValkeyModuleCtx *ctx, size_t len, ValkeyModuleSharedSDSAllocFunc allocfn, ValkeyModuleSharedSDSFreeCBFunc freecbfn);

After calling this, the module needs to mutate the content. An API like this gives the impression that it's OK to mutate it also after adding it to a hash.

If we want to treat the value as immutable, this should return const char *:

char *VM_SharedSDSPtrLen(ValkeyModuleSharedSDS *shared_sds, size_t *len);

We can easily make them mutable though by handling them as a proper reference-counted system. Code that modifies an sds typically uses sdscatlen, sdscatfmt and such functions to build a string. These in turn call sdsMakeRoomFor and sdsResize. If we make sure that these functions duplicate the string if it has more than one ref-count, doing copy-on-write, then we can preserve the expected behavior.

If we don't want this, then the reference-counter has no meaning to the core, so we should remove it. The module can still manage a reference-counter using its alloc and free callbacks. It can allocate some extra space to store it and the free callback can decrement the reference counter and free it if it reaches zero.

I imagine the value of supporting shared SDS for string keys is relatively low, since keys are typically short. For string key values, it probably makes sense to use VM_StringDMA.

I meant the string values. With VM_StringDMA, the core owns the allocation. The module can't keep a reference to it for a long time, because it can be reallocated and freed by the core. See the rules:

 * DMA access rules:
 *
 * 1. No other key writing function should be called since the moment
 * the pointer is obtained, for all the time we want to use DMA access
 * to read or modify the string.
 *
 * 2. Each time VM_StringTruncate() is called, to continue with the DMA
 * access, VM_StringDMA() should be called again to re-obtain
 * a new pointer and length.
 *
 * 3. If the returned pointer is not NULL, but the length is zero, no
 * byte can be touched (the string is empty, or the key itself is empty)
 * so a VM_StringTruncate() call should be used if there is to enlarge
 * the string, and later call StringDMA() again to get the pointer.

VM_StringDMA works with reference-counted objects. To allow mutations, it calls dbUnshareStringValue to duplicate the string if o->refcount != 1, i.e. it does something like copy-on-write.

If this is acceptable for strings, then it should be acceptable for hashes too, right? Something like VM_HashDMA with similar rules? You can't keep it in an index though since you can't know when it's free'd by the core.

@yairgott
Copy link
Contributor Author

Open an issue or PR in the website repo if you want

Will do.

Do you rely on keyspace notifications for this?

Yes that is correct.

What's worse IMHO is to violate assumptions made by the core.

Agreed. Perhaps I didn’t explain my concern clearly: it’s mainly about the feasibility of enforcing immutability for a shared SDS, especially given that SDS is commonly used as a char *. As discussed below, there's also the challenge of preserving caller assumptions regarding memory overhead.

Regarding mutability, if we want to give an impression that it should be treated as immutable, then the proposed module API for creating a shared sds is a bit misleading, ...

Agreed and I'll update the interface. FWIW, the original motivation for proposing this API was to avoid memory copies when the vector is stored using the JSON data type rather than the engine's hash data type. Now that valkey-json is open source, we have the option to extend it with the necessary capabilities.

These in turn call sdsMakeRoomFor and sdsResize. If we make sure that these functions duplicate the string if it has more than one ref-count, doing copy-on-write, then we can preserve the expected behavior.

IMO, it's not so clear cut in this case. Duplication is certainly possible, but doing so unconditionally could break the caller’s assumptions about memory overhead, which may cause friction and issues.

What do you think about an approach where duplication is only allowed if explicitly permitted by the caller, either through an additional function parameter or by embedding an indicator within the shared SDS itself?

You can't keep it in an index though since you can't know when it's free'd by the core.

Exactly, and that’s why it ends up being impractical for our use case.

@yairgott
Copy link
Contributor Author

IMO, implicit SDS duplication doesn’t resonate well. Take, for example, std::shared_ptr<std::string> in C++, any mutation to the referenced string is reflected across all holders of that shared pointer. When a deep copy is required, it’s typically done through a function explicitly named to reflect that behavior, helping avoid misuse.

In contrast, SDS function names like _sdsMakeRoomFor don’t clearly indicate duplication, which could easily become a pitfall.

In a way, the duality introduced by shared SDS is a limitation: it creates a situation where the caller must be aware of whether the SDS in hand is shared or not in some situations.

@zuiderkwast
Copy link
Contributor

zuiderkwast commented May 23, 2025

Perhaps I didn’t explain my concern clearly: it’s mainly about the feasibility of enforcing immutability for a shared SDS, especially given that SDS is commonly used as a char *. As discussed below, there's also the challenge of preserving caller assumptions regarding memory overhead.

Agreed, let's not do implicit duplication.

In a way, the duality introduced by shared SDS is a limitation: it creates a situation where the caller must be aware of whether the SDS in hand is shared or not in some situations.

Yeah, it's an sds but it's not really an sds. That's a limitation and/or broken abstraction.

Thinking about this, does it really have to pretend to be an sds at all? Can't we simply come up with a new string type?

If we free it from the sds burden, we can design it in the way we need here:

  • Immutable, use const for the content.
  • Can't be used directly as a char *.
  • Type safety. Code that uses it knows what they have.
  • Reference-counter, if you want.
typedef struct {
    uint32_t length;
    uint32_t refcount;
    void (*freeCallback)(void *);
    const char content[];
} sharedString;

It can be a single allocation where the last field is of dynamic length. It can be passed around as a proper struct-pointer, rather than a typedef'ed char *.

To store it in a hash entry, just need to use another flag (the field's sds aux-bits) to indicate that it's the kind of hash entry where the value is a sharedString-pointer.

If we ever want this for top-level key-values, we use a new robj encoding for it.

@yairgott
Copy link
Contributor Author

If we free it from the sds burden, we can design it in the way we need here

Agreed. I can also share that our internal implementation is based on a dedicated data type that is not SDS-based. Initially, the shared SDS approach seemed more appealing, but after our discussion, I’ve reached the same conclusion.

Regarding the implementation: one of the main challenges was enabling the hash value to hold either an SDS or the alternate type without incurring additional memory overhead. We addressed this by leveraging the fact that pointer addresses are memory-aligned. Specifically, when storing a reference to the alternate type, we increment the pointer by 1 and then decrement it upon access. This handling is mostly scoped within the hash type code, IIRC there was just one exception where a change crossed the t_hash implementation. Anyway, it's definitely manageable. That said, with the recent SDS embedding change, do you see any concerns with this approach?

If we're aligned that this is the right direction, I'll go ahead and start working on the PR.

@zuiderkwast
Copy link
Contributor

Specifically, when storing a reference to the alternate type, we increment the pointer by 1 and then decrement it upon access. This handling is mostly scoped within the hash type code, IIRC there was just one exception where a change crossed the t_hash implementation. Anyway, it's definitely manageable. That said, with the recent SDS embedding change, do you see any concerns with this approach?

No concerns. We don't need to use pointer bit manipulation for the hash entry value. There is already a way to set flag bits in the hash entry and one of them is used for flagging embedded/non-embedded value. Another will be used for the presence of TTL (Ran's work).

I looked briefly at the t_hash.c code. Some internal functions return sds, e.g. hashTypeEntryGetValue, hashTypeCurrentFromHashTable and few more. We can refactor this code to work return char-pointer and length. It will probably be more lines to change, but the result will be better.

If we're aligned that this is the right direction, I'll go ahead and start working on the PR.

Yes, I'm glad we're aligned. Please go ahead!

@zuiderkwast
Copy link
Contributor

Closing this now. You will open another PR for the other approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants