-
Notifications
You must be signed in to change notification settings - Fork 12
Vector store, refactor encoding to an Astra document #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome
|
I want to draw your attention to the fact that this refactor is "imperfect", in that it leaks usage of Indeed an invariant for implementers should be to map the doc id to _id (descended from and Astra DB requirement). Now, if you're happy with this, I'm fine with going on and perhaps refine this later (a slightly annoying task because it would mean kicking |
|
It's very nice to replace some of the if-else with OOP. |
This PR factors the translations to an from the document stored in the Astra DB collection.
To do so, a new
VSDocumentEncoderinterface is proposed, providing operations such asDocument-to-Astra and vice-versa, knowledge of the proper projection clauses when querying, and also the knowledge of whether embeddings are server-side ($vectorize) or not.The main advantage of this refactoring lie in the next two planned phases after this PR:
similarity_search*methods, aimed at sharing as much as possible between the vectorize- and the nonvectorize- paths;