Feature/valkey vector store#20889
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
As discussed in #20785 , this PR uses TEXT field support which is currently in RC (Release Candidate) in Valkey-Search, not yet GA. The implementation is complete and ready for review, but I'm keeping this as a draft until TEXT support reaches GA (estimated March 2026). I'll mark it ready for merge once the official release is available. In the meantime, feel free to review the implementation and provide any feedback. Happy to address comments while we wait for the GA release. |
AstraBert
left a comment
There was a problem hiding this comment.
This looks great, thanks for doing this! I added some comments related to questions/things I would like to change :)
There was a problem hiding this comment.
This is auto-generated from our docs build, no need to modify it
| class TokenEscaper: | ||
| """ | ||
| Escape punctuation within an input string. Taken from RedisOM Python. | ||
| """ | ||
|
|
||
| # Keep for compatibility. Characters that RediSearch requires us to escape during queries. | ||
| # Source: https://redis.io/docs/stack/search/reference/escaping/#the-rules-of-text-field-tokenization | ||
| DEFAULT_ESCAPED_CHARS = r"[,.<>{}\[\]\\\"\':;!@#$%^&*()\-+=~\/ ]" | ||
|
|
||
| def __init__(self, escape_chars_re: Optional[Pattern] = None): | ||
| if escape_chars_re: | ||
| self.escaped_chars_re = escape_chars_re | ||
| else: | ||
| self.escaped_chars_re = re.compile(self.DEFAULT_ESCAPED_CHARS) | ||
|
|
||
| def escape(self, value: str) -> str: | ||
| def escape_symbol(match: re.Match) -> str: | ||
| value = match.group(0) | ||
| return f"\\{value}" | ||
|
|
||
| return self.escaped_chars_re.sub(escape_symbol, value) |
There was a problem hiding this comment.
I would probably implement this as a function rather than a class
There was a problem hiding this comment.
removed completely. Valkey's filter implementation handles special characters. Added tests to verify.
| url_parts = valkey_url.replace("valkey://", "").split(":") | ||
| host = url_parts[0] if len(url_parts) > 0 else "localhost" | ||
| port = int(url_parts[1]) if len(url_parts) > 1 else 6379 |
There was a problem hiding this comment.
I feel like using something like urllib would make the logic easier for parsing the URL here
|
|
||
| # Create sync client immediately (synchronous operation) | ||
| if not valkey_client: | ||
| from glide_sync import ( |
There was a problem hiding this comment.
Imports should be placed at the top
| @property | ||
| def client( | ||
| self, | ||
| ) -> SyncGlideClient | SyncGlideClusterClient | GlideClient | GlideClusterClient: | ||
| """Return the valkey client instance.""" | ||
| return self._valkey_client or self._valkey_client_async |
There was a problem hiding this comment.
We generally use two different properties, one for the sync client (client) and one for the asynchronous one (aclient)
| def _ensure_sync_client(self) -> None: | ||
| """ | ||
| Ensure sync client is available. | ||
|
|
||
| Raises: | ||
| Exception: If sync client is not available. | ||
|
|
||
| """ | ||
| if not self._valkey_client: | ||
| raise ValkeyVectorStoreError("No sync client available") |
There was a problem hiding this comment.
In the spirit of the lazily initialization comment above, I would initialize the sync client here once (pretty much as you do for the async client)
| VECTOR_FIELD_NAME: str = "vector" | ||
|
|
||
|
|
||
| class ValkeyIndexInfo: |
There was a problem hiding this comment.
This should probably be a dataclass
|
|
||
| GLIDE_AVAILABLE = True | ||
| except ImportError: | ||
| GLIDE_AVAILABLE = False |
There was a problem hiding this comment.
This should always be available as I am guessing is part of the valkey-glide and valkey-glide-sync required dependencies in the pyproject.toml(?)
If it is not available, I would consider adding it as a required dependency, or, if you do not want overhead for the package, you could just add it as a dev dependency to be bundled only with tests
| [tool.uv.sources] | ||
| llama-index-core = {path = "../../../llama-index-core", editable = true} |
There was a problem hiding this comment.
I feel like this might mess up some things when someone tries to install the package, I would remove it
| pass | ||
|
|
||
|
|
||
| @pytest.mark.integration |
There was a problem hiding this comment.
I don't see any configuration related to the integration marker here for pytest, and I am not sure our CI would automatically skip them, so I think you might need to add it in the pytest.ini config in the pyproject file
|
@AstraBert Thanks for the review! All comments have been addressed — ready for a second look when you get a chance. |
Signed-off-by: Daria Korenieva <daric2612@gmail.com>
Signed-off-by: Daria Korenieva <daric2612@gmail.com>
Signed-off-by: Daria Korenieva <daric2612@gmail.com>
cdf9f14 to
c6b795b
Compare
|
The valkey-search module with text search support is now GA (v1.2.0). |
|
Sorry, right now we are pausing contributions that contribute net-new packages. I appreciate the contribution but going to close this out. |
|
Thanks for the review and for letting me know. I understand the pause on new packages. If the policy changes in the future, I'd be happy to rebase and resubmit. |
Description
Adds a new vector store integration for Valkey, an open-source key-value datastore that supports high-performance vector similarity search through the valkey-search module.
Fixes #20785
Dependencies
Required
Development
Standard LlamaIndex dev dependencies (pytest, mypy, ruff, etc.)
Motivation and Context
Valkey is gaining significant traction as an open-source alternative to Redis. Several factors make this integration timely and valuable:
New Package?
Did I fill in the
tool.llamahubsection in thepyproject.tomland provide a detailed README.md for my new integration or package?Version Bump?
It is a new integration.
Type of Change
New feature (non-breaking change which adds functionality)
How Has This Been Tested?
I added new unit and integration tests to cover this change. Manual testing.
Suggested Checklist:
uv run make format; uv run make lintto appease the lint gods