Adding support for sharing memory between the module and the engine #2472

yairgott · 2025-08-12T04:57:26Z

Overview

Sharing memory between the module and engine reduces memory overhead by eliminating redundant copies of stored records in the module. This is particularly beneficial for search workloads that require indexing large volumes of documents.

Vectors

Vector similarity search requires storing large volumes of high-cardinality vectors. For example, a single vector with 512 dimensions consumes 2048 bytes, and typical workloads often involve millions of vectors. Due to the lack of a memory-sharing mechanism between the module and the engine, valkey-search currently doubles memory consumption when indexing vectors, significantly increasing operational costs. This limitation introduces adoption friction and reduces valkey-search's competitiveness.

Memory Allocation Strategy

At a fundamental level, there are two primary allocation strategies:

[Chosen] Module-allocated memory shared with the engine.
Engine-allocated memory shared with the module.

For valkey-search, it is crucial that vectors reside in cache-aligned memory to maximize SIMD optimizations. Allowing the module to allocate memory provides greater flexibility for different use cases, though it introduces slightly higher implementation complexity.

Old Implementation

The old implementation was based on ref-counting and introduced a new SDS type. After further discussion, we agreed to simplify the design by removing ref-counting and avoiding the introduction of a new SDS type.

New Implementation - Key Points

The engine exposes a new interface, VM_HashSetViewValue, which set value as a view of a buffer which is owned by the module. The function accepts the hash key, hash field, and a buffer along with its length.
ViewValue is a new data type that captures the externalized buffer and its length.

valkey-search Usage

Insertion

Upon receiving a key space notification for a new hash or JSON key with an indexed vector attribute, valkey-search allocates cache-aligned memory and deep-copies the vector value.
valkey-search then calls VM_HashSetViewValue to avoid keeping two copies of the vector.

Deletion

When receiving a key space notification for a deleted hash key or hash field that was indexed as a vector, valkey-search deletes the corresponding entry from the index.

Update

Handled similarly to insertion.

Signed-off-by: yairgott <[email protected]>

ranshid

@yairgott placing some high level comments first:

Not sure I like the ViewValue naming. I think the intention is to keep a "string" pointer and a length right? in that case maybe we should simply do that and call it a StringValue ?
I did not understand why we have to change the entryGetValue API? I would prefer to keep a separate API to get the "string" value like entryGetStringValue (or in your case entryGetViewValue)

src/entry.c

Signed-off-by: yairgott <[email protected]>

yairgott · 2025-08-15T17:31:36Z

Not sure I like the ViewValue naming. I think the intention is to keep a "string" pointer and a length right? in that case maybe we should simply do that and call it a StringValue ?

Renamed it to bufferView.

I did not understand why we have to change the entryGetValue API? I would prefer to keep a separate API to get the "string" value like entryGetStringValue (or in your case entryGetViewValue)

entryGetValue is called in many different places, including outside of entry.c and t_hash.c. Implementing a dedicated API to get the view value, would lead to wrapping each entryGetValue call with a check wether the value is a view or not.

codecov · 2025-08-15T17:31:47Z

Codecov Report

❌ Patch coverage is 66.21622% with 75 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.41%. Comparing base (05540af) to head (f2f7f80).

Files with missing lines	Patch %	Lines
src/entry.c	64.80%	44 Missing ⚠️
src/t_hash.c	60.37%	21 Missing ⚠️
src/module.c	18.18%	9 Missing ⚠️
src/ziplist.c	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##           unstable    #2472   +/-   ##
=========================================
  Coverage     72.41%   72.41%           
=========================================
  Files           128      128           
  Lines         70414    70501   +87     
=========================================
+ Hits          50987    51054   +67     
- Misses        19427    19447   +20

Files with missing lines	Coverage Δ
src/aof.c	`81.21% <100.00%> (-0.05%)`	⬇️
src/db.c	`93.10% <100.00%> (-0.06%)`	⬇️
src/rdb.c	`77.16% <100.00%> (+0.09%)`	⬆️
src/server.h	`100.00% <ø> (ø)`
src/ziplist.c	`15.16% <0.00%> (ø)`
src/module.c	`9.79% <18.18%> (+0.02%)`	⬆️
src/t_hash.c	`94.50% <60.37%> (-1.66%)`	⬇️
src/entry.c	`80.35% <64.80%> (-15.02%)`	⬇️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ranshid · 2025-08-17T12:20:28Z

I am still providing high level comments (although I do have detailed comments) since IMO we should solve the highlevel first. I think we should have a clear definition of the entry API.

Currently the entry is defined as a storage for sds type field and sds type value. In your suggestion this is changed and an entry can be provided with a value which is either an sds or a pointer to a native string and will INTERNALLY encode it the way it would like to.

Lets list the reason for this change IIUC:

The current entry API only supports sds value types. For different cases (e.g VSS module) it might be needed to have hashes keep string references which are NOT sds types.
The entry is taking ownership of the value sds. In some cases it is needed that the entry will only keep a reference of the value and will NOT own the value (i.e not free it when deleted and not account for the memory as part of the object)

Your suggestion is to change the entry API so:

It will be able to accept both sds type values AND string-references.
When a string reference is provided the entry will NOT take ownership over it.
string-references will NOT be embedded (following (2)) so it implies these entries will always have the FIELD_SDS_AUX_BIT_ENTRY_HAS_VALUE_PTR set.
string-references will NOT be accounted as the entry memory usage in entryMemUsage (I did not notice it is handled, and actually maybe there is a bug there)
value getters will not allow access to the value in the encoding it was provided. This means that although entry allowed providing the input value as sds and is INTERNALLY encoding it as sds, there is no way to retrieve a reference to the sds value.
Entry which is set with string-reference value, is ALWAYS expected to provide string reference value (?)

I think that following this suggestion I would handle the following cases:

switch between string-reference and sds value encodings - Not sure how this is currently handled correctly. I see that entrySetValuePtr, is assuming (6) ?. I think that the entryCreate and entryUpdate APIs should allow providing value as either sds or string reference. this can be done either by providing different functions prototypes for each (which is complicated) or by adding some API options like:

/* In this case the API will validate that either the value is provided OR the vref and assert otherwise */
entry *entryCreate(const_sds field, sds value, char* vref, size_t vlen, long long expiry);
entry *entryUpdate(entry *e, sds value, char* vref, size_t vlen, long long expiry);

OR

/* In this case the user will always have to provide the string value pointer and size but indicate explicitly that the value is sds.
entry *entryCreate(const_sds field, char* value, size_t vlen, bool, value_is_sds, long long expiry);
entry *entryUpdate(entry *e, char* value, size_t vlen, bool, value_is_sds, long long expiry);

OR

/* In this case the user will always have to provide the string value pointer and size but indicate explicitly that the value is sds by passing vlen zero (0). I think this is little bit prone to bugs, and an explicit way to  indicate the inout is sds or not might be better.
entry *entryCreate(const_sds field, char* value, size_t vlen, long long expiry);
entry *entryUpdate(entry *e, char* value, size_t vlen, long long expiry);

the entryGetValue change is generally fine, but it should be clear and reflected though the entire entry documentation.
I also think that we should provide a way to access the value as sds if it was encoded this way.
value/bufferView - IMO the view is not a great word for wht it is. personally stringRef reflects the real usage and provides a fine indication of what and why the entry will use it the way it does.

I also think the top comment might be missing some more alternatives that we consider(?):
for example:

add reference count option to entry. (probably not a good fit for VSS module case... but can we explain exactly why?)
create a new reference count sds type (you did provide some link and explanation to why we did not go this way)
more options if we have them...

yairgott · 2025-08-18T16:20:16Z

string-references will NOT be accounted as the entry memory usage in entryMemUsage (I did not notice it is handled, and actually maybe there is a bug there)

Fixed entryMemUsage

there is no way to retrieve a reference to the sds value.

entryGetValueRef is still available, and remains static to entry.c.

Entry which is set with string-reference value, is ALWAYS expected to provide string reference value

I'm not 100% clear but I'll note that outside of entry.c, one should still be using the public interface entryGetValue.

I think that the entryCreate and entryUpdate APIs should allow providing value as either sds or string reference.

entryCreate - creating an entry with a value view is not supported. An entry may switch to store a value view by calling t_hash.c, hashTypeSetValueView. For safety, I've incorporated a debug mode verification that the provided view buffer matches the entry SDS.

entryUpdate is modified to support updating view value just with expiration field. FWIW, entrySetValue was primary changed to improved code reuse and avoid code duplication in entryUpdate, see entrySetValuePtr.

value/bufferView - IMO the view is not a great word for wht it is. personally stringRef reflects the real usage and provides a fine indication of what and why the entry will use it the way it does.

Naming is hard :) The name bufferView is inspired by the C++ std::string_view.Also, IMO:

I think that buffer, rather than string, makes a better fit in this case.
The name reference could be confusing due to it connotation with C++.

ranshid · 2025-08-18T18:07:51Z

string-references will NOT be accounted as the entry memory usage in entryMemUsage (I did not notice it is handled, and actually maybe there is a bug there)

Fixed entryMemUsage

there is no way to retrieve a reference to the sds value.

entryGetValueRef is still available, and remains static to entry.c.

entryGetValueRef, is meant for internal use and will not work in all cases, so it is not fit for a public API. if, for example I am a user of entry and I provided an entry with sds and I need to continue using an sds from that entry, I have no way of doing so aside for creating a new sds.
like:

sds my_private_data.
entry *e = entryCreate(field, my_private_data, expiry);
...
// when I need to use the value sds for some cases, I am forced to create a new sds:
sds my_private_data = entryGetValue(e, &len); // X - will not work
sds my_private_data = sdsnewlen(entryGetValue(e, &len), len); //  will work, but will require allocating and making sure to deallocate after use.

Entry which is set with string-reference value, is ALWAYS expected to provide string reference value

I'm not 100% clear but I'll note that outside of entry.c, one should still be using the public interface entryGetValue.

I think that the entryCreate and entryUpdate APIs should allow providing value as either sds or string reference.

entryCreate - creating an entry with a value view is not supported. An entry may switch to store a value view by calling t_hash.c, hashTypeSetValueView. For safety, I've incorporated a debug mode verification that the provided view buffer matches the entry SDS.

entryUpdate is modified to support updating view value just with expiration field. FWIW, entrySetValue was primary changed to improved code reuse and avoid code duplication in entryUpdate, see entrySetValuePtr.

From reading the code I see that:

calling entryUpdate with a value which is not a "view" will just assert when the entry has a view already
entrySetValue will just assert as well because of (1)

I think this needs to be clear. If the entry is allowing to change an entry which is created with an sds value to an entry which has a "view", why we cannot do the opposite? and why do we expose a public API that is simply asserting instead of preventing this in some way? I think that as the entry is a stand-alone module, it should be generic and flexible, or the API should be restrictive and not allow what is simply not supported.

value/bufferView - IMO the view is not a great word for wht it is. personally stringRef reflects the real usage and provides a fine indication of what and why the entry will use it the way it does.

Naming is hard :) The name bufferView is inspired by the C++ std::string_view.Also, IMO:

I think that buffer, rather than string, makes a better fit in this case.

The name reference could be confusing due to it connotation with C++.

Well we are not doing c++ unfortunately, and pointers here are treated as references IMO. So I think it is a better name, but will not make this the main point of the review. I just think that the way to distinguish a "native c style string" to what you have which requires to ALWAYS keep the provided reference is to use a name like stringref.

To conclude. I would like to have a complete API which is both generic and standalone together with supportive to the existing usecases we have. As I mentioned before, we could allow entry to accept both sds value and "native c style strings" and encode it internally which is subject to the internal implementation which should prefer memory efficiency and performance (avoid large copies and better cache line locality).

For first thing we should solve the part of the API which accepts values if we plan to NOT allow accepting . How do we handle the HSET or HINCRBYcommands? do we intercept it from the module in case of the VSS? so how should other modules handle it?

yairgott · 2025-08-18T19:03:53Z

From reading the code I see that:

calling entryUpdate with a value which is not a "view" will just assert when the entry has a view already

entrySetValue will just assert as well because of (1)

Right, I'll fix this. The idea is to handle adding expiry to a view value. Otherwise, the existing code, with slight changes, will handle updating a view value.

value/bufferView - IMO the view is not a great word for wht it is. personally stringRef reflects the real usage and provides a fine indication of what and why the entry will use it the way it does.

Naming is hard :) The name bufferView is inspired by the C++ std::string_view.Also, IMO:

I think that buffer, rather than string, makes a better fit in this case.

The name reference could be confusing due to it connotation with C++.

Well we are not doing c++ unfortunately, and pointers here are treated as references IMO. So I think it is a better name, but will not make this the main point of the review. I just think that the way to distinguish a "native c style string" to what you have which requires to ALWAYS keep the provided reference is to use a name like stringref.

I'm not religious about the name, if you feel strongly about the stringref, I'll just change it.

To conclude. I would like to have a complete API which is both generic and standalone together with supportive to the existing usecases we have. As I mentioned before, we could allow entry to accept both sds value and "native c style strings" and encode it internally which is subject to the internal implementation which should prefer memory efficiency and performance (avoid large copies and better cache line locality).

I think that making the API complete is not really an objective but rather supporting the use-cases. Take for example entryCreate where there is no use-case which requires creation of a stringref value entry, but rather a stringref value entry is always triggered by the module on an existing entry. Adding support for a "complete" API would add another layer of complexity without providing any value. entryCreate, entryUpdate APIs receive value as SDS, which makes it clear that a stringref object is not supported.

For first thing we should solve the part of the API which accepts values if we plan to NOT allow accepting . How do we handle the HSET or HINCRBYcommands? do we intercept it from the module in case of the VSS? so how should other modules handle it?

Any update to the entry is handled by a call to entryUpdate. The module registers event keyspace notification which is triggered by commands like HSET or HINCRBY. The module event handling logic involves reading the entry and handling the mutation accordingly.

ranshid · 2025-08-18T19:20:00Z

From reading the code I see that:

calling entryUpdate with a value which is not a "view" will just assert when the entry has a view already

entrySetValue will just assert as well because of (1)

Right, I'll fix this. The idea is to handle adding expiry to a view value. Otherwise, the existing code, with slight changes, will handle updating a view value.

value/bufferView - IMO the view is not a great word for wht it is. personally stringRef reflects the real usage and provides a fine indication of what and why the entry will use it the way it does.

Naming is hard :) The name bufferView is inspired by the C++ std::string_view.Also, IMO:

I think that buffer, rather than string, makes a better fit in this case.

The name reference could be confusing due to it connotation with C++.

Well we are not doing c++ unfortunately, and pointers here are treated as references IMO. So I think it is a better name, but will not make this the main point of the review. I just think that the way to distinguish a "native c style string" to what you have which requires to ALWAYS keep the provided reference is to use a name like stringref.

I'm not religious about the name, if you feel strongly about the stringref, I'll just change it.

To conclude. I would like to have a complete API which is both generic and standalone together with supportive to the existing usecases we have. As I mentioned before, we could allow entry to accept both sds value and "native c style strings" and encode it internally which is subject to the internal implementation which should prefer memory efficiency and performance (avoid large copies and better cache line locality).

I think that making the API complete is not really an objective but rather supporting the use-cases. Take for example entryCreate where there is no use-case which requires creation of a stringref value entry, but rather a stringref value entry is always triggered by the module on an existing entry. Adding support for a "complete" API would add another layer of complexity without providing any value. entryCreate, entryUpdate APIs receive value as SDS, which makes it clear that a stringref object is not supported.

I think that the entry should keep a clear and concrete API. this API is not going to be used ONLY by the search module, but potentially by other future parts of the application as well as future modules, and it would be great if we could make the API complete. But let me try and track it better in the detailed review.

For first thing we should solve the part of the API which accepts values if we plan to NOT allow accepting . How do we handle the HSET or HINCRBYcommands? do we intercept it from the module in case of the VSS? so how should other modules handle it?

Any update to the entry is handled by a call to entryUpdate. The module registers event keyspace notification which is triggered by commands like HSET or HINCRBY. The module event handling logic involves reading the entry and handling the mutation accordingly.

So that would require to handle entryUpdate correctly. I see that you suggest you will fix that, so I will followup on that.

ranshid · 2025-08-18T19:20:58Z

@yairgott also please make a run through the current documentation and structure ascii and change them accordingly.

Signed-off-by: yairgott <[email protected]>

yairgott · 2025-08-19T11:46:44Z

So that would require to handle entryUpdate correctly. I see that you suggest you will fix that, so I will followup on that.

entryUpdate already works correctly. Let me know if you find any issues. I've also added unittests.

also please make a run through the current documentation and structure ascii and change them accordingly.

Sure.

Also, noting that I've done the renaming to stringRef.

Signed-off-by: yairgott <[email protected]>

ranshid

went though some more stuff. will continue tomorrow

src/entry.c

src/entry.h

src/entry.c

ranshid · 2025-08-20T15:10:59Z

src/entry.c

    return entryWrite(buf, buf_size, field, value, expiry, embed_value, embedded_field_sds_type, embedded_field_sds_size, embedded_value_sds_size, expiry_size);
 }

+entry *entrySetStringRef(entry *entry, const char *buf, size_t len, long long expiry) {


To me it sounds like we are duplicating some stuff form entryUpdate. The way I imagined it is that the entry can be encoded either like:

1. field | embedded value 2. value ptr | field 3. ttl | field | embedded value 4. ttl | value ptr | field 5. vstring | vlen | field 6. ttl | vstring | vlen | field

So entryUpdate should be able to navigate between ALL these cases and as such can be used inside entrySetStringRef.

I also still not a big fan of the fact that there is no real way to create an entry with a stringRef. Maybe it is not needed right now, but the API seems strange that way IMO.

Decoupling with a dedicated function, rather than extending entryUpdate to support string ref, is much more straight forward both to implement and to maintain. Extending entryUpdate would involve:

Supporting both types of input values, sds and [char * buf, size_t len].

Extending the logic to properly handle all the possible combinations.

In general, we should strive for code which is easy to maintain and resilient. IMO entryUpdate is already too complicated and it already handles too many different cases which adds complexity to follow and to reason about.

In term of code reuse, I will try to explore how to improve the code section in lines 324-342.

Edited!

I would like to suggest the following changes to improve the code quality:

Starting with entryUpdate:

entry *entryUpdate(entry *e, sds value, long long expiry) { if (entryChangeLayout(e, value, expiry)) { entry *new_entry = entryCreate((sds)e, value, expiry); entryFree(e); return new_entry; } // Layout was not changed, just apply the value and the expiry entryReplaceValue(e, value); if (expiry != EXPIRY_NONE) entryReplaceExpiry(e, expiry); return e; }

Now that entryReqSize is called just by entryCreate and therefor its logic can be embedded inside entryCreate and refactored to greatly simplified.

If agreed with the above, I can drive this changes but I prefer to drive this after this PR is landed. WDYT?

src/module.c

src/t_hash.c

zuiderkwast · 2025-08-25T15:30:15Z

We discussed this in the core team meeting today.

If we've closed on the design and it's been reviewed by next Monday, we can merge it to 9.0 RC 2, but otherwise we can postpone it to 9.1.

Signed-off-by: Yair Gottdenker <[email protected]>

src/t_hash.c

Signed-off-by: Yair Gottdenker <[email protected]>

ranshid · 2025-10-08T08:08:02Z

Overall the code changes LGTM.

NOTE: this might have future conflicts with #2618

@valkey-io/core-team how can we progress the major-decision-pending checkout? does any other maintainer wish to go over and check this?

madolson · 2025-10-27T14:05:32Z

@ranshid Hey, what specifically do you want the core team to address?

madolson · 2025-10-27T14:18:22Z

We reviewed the APIs, they seem minimal but there are no concerns with it. @JimB123 since Ran asked for another pair of eyes, PTAL. Also Ran, please approve if you are happy with the PR.

JimB123 · 2025-10-28T16:51:59Z

@JimB123 since Ran asked for another pair of eyes, PTAL. Also Ran, please approve if you are happy with the PR.

Ack. Reviewing.

src/module.c

tests/modules/hash_stringref.c

src/entry.c

src/t_hash.c

src/db.c

src/server.h

madolson · 2025-11-10T16:43:35Z

@ranshid @JimB123 @yairgott What of the comments that are blocking this PR from getting merged? We have plenty of time before 9.1, but I just don't want to end up in a situation where we don't merge it again.

ranshid

Let me try and sum-up my thoughts about the status:

Documentation improvements

https://github.com/valkey-io/valkey/pull/2472/files#r2473588727
https://github.com/valkey-io/valkey/pull/2472/files#r2471218383
https://github.com/valkey-io/valkey/pull/2472/files#r2473736680
https://github.com/valkey-io/valkey/pull/2472/files#r2470424309
https://github.com/valkey-io/valkey/pull/2472/files#r2470568137
https://github.com/valkey-io/valkey/pull/2472/files#r2470572782
https://github.com/valkey-io/valkey/pull/2472/files#r2470572981

code refactor

create a dedicated scan type: https://github.com/valkey-io/valkey/pull/2472/files#r2479301622

add a dedicated stringRef getter: https://github.com/valkey-io/valkey/pull/2472/files#r2471108477

refactor entryGetValue: https://github.com/valkey-io/valkey/pull/2472/files#r2471145583

entryUpdate refactor: https://github.com/valkey-io/valkey/pull/2472/files#r2474627473 + https://github.com/valkey-io/valkey/pull/2472/files#r2478426332
https://github.com/valkey-io/valkey/pull/2472/files#r2474655831

entryUpdateAsStringRef: https://github.com/valkey-io/valkey/pull/2472/files#r2474037834 + https://github.com/valkey-io/valkey/pull/2472/files#r2478563045 + https://github.com/valkey-io/valkey/pull/2472/files#r2473574601 + https://github.com/valkey-io/valkey/pull/2472/files#r2479164080+ https://github.com/valkey-io/valkey/pull/2472/files#r2479155661

refactor entryConstruct: https://github.com/valkey-io/valkey/pull/2472/files#r2475553245

trivial code refactor

https://github.com/valkey-io/valkey/pull/2472/files#r2471200320
https://github.com/valkey-io/valkey/pull/2472/files#r2471192263
https://github.com/valkey-io/valkey/pull/2472/files#r2479118890
https://github.com/valkey-io/valkey/pull/2472/files#r2479166951
https://github.com/valkey-io/valkey/pull/2472/files#r2479190818
https://github.com/valkey-io/valkey/pull/2472/files#r2479214355

correctness

https://github.com/valkey-io/valkey/pull/2472/files#r2471130746
https://github.com/valkey-io/valkey/pull/2472/files#r2478894678
https://github.com/valkey-io/valkey/pull/2472/files#r2478794808
https://github.com/valkey-io/valkey/pull/2472/files#r2478831245
https://github.com/valkey-io/valkey/pull/2472/files#r2478899287

open issues

https://github.com/valkey-io/valkey/pull/2472/files#r2473620702
https://github.com/valkey-io/valkey/pull/2472/files#r2475548331 - should be handled when we rebase unstable to solve conflict
https://github.com/valkey-io/valkey/pull/2472/files#r2479323821

IMO the open issues + correctness + trivial code refactor can be completed in order to merge it.
We can consider followup in a different PR about the documentation and major code refactor
@JimB123 @yairgott WDYT?

src/entry.c

src/server.h

src/t_hash.c

JimB123 · 2025-11-12T19:17:28Z

IMO the open issues + correctness + trivial code refactor can be completed in order to merge it.
We can consider followup in a different PR about the documentation and major code refactor
@JimB123 @yairgott WDYT?

@ranshid I agree that more significant code refactors (like redesign of entryConstruct) can be addressed separately from this PR. However, I think that documentation issues are critical and should be updated now. This feature introduces some pretty fundamental changes, and we need to make sure that everyone is clear regarding usage of this API and associated memory management considerations.

Of the items you listed as (non-trivial) code refactors, I think we should do this one now:

A separate, typed, stringref getter - https://github.com/valkey-io/valkey/pull/2472/files#r2471108477 This is important for maintainability and for limiting unintended void creep. It's needed to clarify which functions are actually working on stringRefs and which are working on sds. As is, this change makes the code less maintainable as it removes the existing type checking.

ranshid · 2025-11-12T19:31:53Z

IMO the open issues + correctness + trivial code refactor can be completed in order to merge it.
We can consider followup in a different PR about the documentation and major code refactor
@JimB123 @yairgott WDYT?

@ranshid I agree that more significant code refactors (like redesign of entryConstruct) can be addressed separately from this PR. However, I think that documentation issues are critical and should be updated now. This feature introduces some pretty fundamental changes, and we need to make sure that everyone is clear regarding usage of this API and associated memory management considerations.

Of the items you listed as (non-trivial) code refactors, I think we should do this one now:

A separate, typed, stringref getter - https://github.com/valkey-io/valkey/pull/2472/files#r2471108477 This is important for maintainability and for limiting unintended void creep. It's needed to clarify which functions are actually working on stringRefs and which are working on sds. As is, this change makes the code less maintainable as it removes the existing type checking.

I agree about the documentation. just would like @yairgott to be aligned

Signed-off-by: Yair Gottdenker <[email protected]>

src/entry.c

ranshid · 2025-11-19T12:11:13Z

src/entry.c

        mem += zmalloc_usable_size(entryGetAllocPtr(entry));
    }
-    mem += sdsAllocSize((sds)entryGetValue(entry, NULL));
+    if (!entryHasStringRef(entry)) mem += sdsAllocSize((sds)entryGetValue(entry, NULL));


I still feel like we miss accounting for the stringRef (16 bytes) itself.

This is accounted for in line 457.

@yairgott maybe the lines count shifted with some of the changes.
If I follow the memory count steps, the memory with stringRef dose NOT have an embedded value so I am taking the malloc size of the main entry buffer:

| long long | stringRef (pointer) | sdshdr8+ | "foo" \0 |/ / / / |

Now, that include the pointer to the stringref mid-layer, but we never add this to the count.

Suggested change

if (!entryHasStringRef(entry)) mem += sdsAllocSize((sds)entryGetValue(entry, NULL));

if (!entryHasStringRef(entry)) mem += sdsAllocSize((sds)entryGetValue(entry, NULL));

else mem += sizeof(stringRef); // can also consider mem += zmalloc_usable_size(entryGetStringRefRef(entry));

src/entry.h

Co-authored-by: Ran Shidlansik <[email protected]> Signed-off-by: Yair Gottdenker <[email protected]>

Signed-off-by: Yair Gottdenker <[email protected]>

ranshid · 2025-11-24T12:46:44Z

src/entry.c

-    if (entryHasValuePtr(entry)) {
-        dismissSds(*entryGetValueRef(entry));
-    }
+    if (entryHasValuePtr(entry)) entryFreeValuePtr(entry);


Maybe we can currently keep the existing logic?

Suggested change

if (entryHasValuePtr(entry)) entryFreeValuePtr(entry);

if (entryHasValuePtr(entry) && !entryHasStringRef(entry)) dismissSds(*entryGetValueRef(entry));

ranshid · 2025-11-24T15:18:48Z

src/entry.c

+ *     +--------------+----------+----------+----------+----------+--------+
+ *                               |
+ *                               |
+ *                               stringRef value


I think the drawing is confusing.

The stringref is another mid-layer part holding the value size and pointer, so maybe we can illustrate that

ranshid · 2025-11-24T15:25:44Z

src/entry.c

        mem += zmalloc_usable_size(entryGetAllocPtr(entry));
    }
-    mem += sdsAllocSize((sds)entryGetValue(entry, NULL));
+    if (!entryHasStringRef(entry)) mem += sdsAllocSize((sds)entryGetValue(entry, NULL));


@yairgott maybe the lines count shifted with some of the changes.
If I follow the memory count steps, the memory with stringRef dose NOT have an embedded value so I am taking the malloc size of the main entry buffer:

| long long | stringRef (pointer) | sdshdr8+ | "foo" \0 |/ / / / |

Now, that include the pointer to the stringref mid-layer, but we never add this to the count.

Suggested change

if (!entryHasStringRef(entry)) mem += sdsAllocSize((sds)entryGetValue(entry, NULL));

if (!entryHasStringRef(entry)) mem += sdsAllocSize((sds)entryGetValue(entry, NULL));

else mem += sizeof(stringRef); // can also consider mem += zmalloc_usable_size(entryGetStringRefRef(entry));

yairgott added 2 commits August 12, 2025 03:03

fixing test issue

9ee29c2

Signed-off-by: yairgott <[email protected]>

fixing test issue

a9fc017

Signed-off-by: yairgott <[email protected]>

yairgott mentioned this pull request Aug 12, 2025

Adding support for sharing memory between the module and the engine #2299

Closed

yairgott added 3 commits August 12, 2025 14:57

addressing unittest issues

7dae1e6

Signed-off-by: yairgott <[email protected]>

addressing unittest issues

5d4c16a

Signed-off-by: yairgott <[email protected]>

addressing lint

f826960

Signed-off-by: yairgott <[email protected]>

yairgott marked this pull request as ready for review August 12, 2025 15:21

madolson requested a review from ranshid August 12, 2025 18:05

madolson added this to Valkey 9.0 Aug 12, 2025

madolson moved this to In Progress in Valkey 9.0 Aug 12, 2025

ranshid reviewed Aug 13, 2025

View reviewed changes

src/entry.c Show resolved Hide resolved

src/entry.c Outdated Show resolved Hide resolved

src/entry.c Outdated Show resolved Hide resolved

src/entry.c Show resolved Hide resolved

src/entry.c Outdated Show resolved Hide resolved

addressing comments

78b90bb

Signed-off-by: yairgott <[email protected]>

yairgott added 3 commits August 19, 2025 03:03

renaming, bug fixing and adding unittests

6f7f990

Signed-off-by: yairgott <[email protected]>

unittest fix

848fd6d

Signed-off-by: yairgott <[email protected]>

cosmetic change

a7a1537

Signed-off-by: yairgott <[email protected]>

yairgott added 3 commits August 19, 2025 13:31

fixing module tests

06b6cfc

Signed-off-by: yairgott <[email protected]>

Merge remote-tracking branch 'upstream/unstable' into hash_shared

c8a7aac

fixing comments

356891d

Signed-off-by: yairgott <[email protected]>

ranshid reviewed Aug 20, 2025

View reviewed changes

zuiderkwast added the major-decision-pending Major decision pending by TSC team label Aug 25, 2025

Yair Gottdenker added 3 commits September 9, 2025 19:59

addressing comments

6d5a45e

Signed-off-by: Yair Gottdenker <[email protected]>

addressing code review comments

2aea44d

Signed-off-by: Yair Gottdenker <[email protected]>

fixing module test

37a006c

Signed-off-by: Yair Gottdenker <[email protected]>

ranshid reviewed Sep 14, 2025

View reviewed changes

src/t_hash.c Show resolved Hide resolved

src/t_hash.c Outdated Show resolved Hide resolved

src/t_hash.c Outdated Show resolved Hide resolved

Yair Gottdenker added 2 commits September 16, 2025 19:28

addressing code review comments

b97667a

Signed-off-by: Yair Gottdenker <[email protected]>

addressing code review comments

bf6d7c7

Signed-off-by: Yair Gottdenker <[email protected]>

madolson requested a review from JimB123 October 27, 2025 14:10

madolson added major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels Oct 27, 2025

JimB123 reviewed Oct 30, 2025

View reviewed changes

allenss-amazon mentioned this pull request Nov 10, 2025

[NEW] Combined list of Issues for Search Module #2559

Open

ranshid reviewed Nov 10, 2025

View reviewed changes

addressing review comments

ce0e3a9

Signed-off-by: Yair Gottdenker <[email protected]>

ranshid reviewed Nov 19, 2025

View reviewed changes

yairgott and others added 7 commits November 19, 2025 10:59

Update src/entry.c

96a4530

Co-authored-by: Ran Shidlansik <[email protected]> Signed-off-by: Yair Gottdenker <[email protected]>

addressing review comments

a0f541d

Signed-off-by: Yair Gottdenker <[email protected]>

Merge branch 'unstable' into hash_shared

90ac405

Signed-off-by: Yair Gottdenker <[email protected]>

fixing rebase issues

2569ea2

Signed-off-by: Yair Gottdenker <[email protected]>

lint fixes

2423aa6

Signed-off-by: Yair Gottdenker <[email protected]>

lint fixes

f03c9ad

Signed-off-by: Yair Gottdenker <[email protected]>

fixing the test module hash_stringref initialization

f2f7f80

Signed-off-by: Yair Gottdenker <[email protected]>

ranshid reviewed Nov 24, 2025

View reviewed changes

	if (!entryHasStringRef(entry)) mem += sdsAllocSize((sds)entryGetValue(entry, NULL));
	if (!entryHasStringRef(entry)) mem += sdsAllocSize((sds)entryGetValue(entry, NULL));
	else mem += sizeof(stringRef); // can also consider mem += zmalloc_usable_size(entryGetStringRefRef(entry));

	if (entryHasValuePtr(entry)) entryFreeValuePtr(entry);
	if (entryHasValuePtr(entry) && !entryHasStringRef(entry)) dismissSds(*entryGetValueRef(entry));

Adding support for sharing memory between the module and the engine #2472

Are you sure you want to change the base?

Adding support for sharing memory between the module and the engine #2472

Conversation

yairgott commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Vectors

Memory Allocation Strategy

Old Implementation

New Implementation - Key Points

valkey-search Usage

Insertion

Deletion

Update

Uh oh!

ranshid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yairgott commented Aug 15, 2025

Uh oh!

codecov bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ranshid commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yairgott commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ranshid commented Aug 18, 2025

Uh oh!

yairgott commented Aug 18, 2025

Uh oh!

ranshid commented Aug 18, 2025

Uh oh!

ranshid commented Aug 18, 2025

Uh oh!

yairgott commented Aug 19, 2025

Uh oh!

ranshid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ranshid commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madolson commented Oct 27, 2025

Uh oh!

madolson commented Oct 27, 2025

Uh oh!

JimB123 commented Oct 28, 2025

Uh oh!

Uh oh!

Uh oh!

yairgott commented Aug 12, 2025 •

edited

Loading

codecov bot commented Aug 15, 2025 •

edited

Loading

ranshid commented Aug 17, 2025 •

edited

Loading

yairgott commented Aug 18, 2025 •

edited

Loading

ranshid commented Oct 8, 2025 •

edited

Loading