Skip to content

Conversation

@hemidactylus
Copy link
Collaborator

This PR aims at simplifying the chain of calls around the various [a]similarity_search_* methods for the vector store.

The main goal is readability and reduction of duplicated code: the former via a rationalization of which method reformats which one's result; the latter through shared "by sort" function where the sort parameter can reflect both the vectorize and the regular vector-based approaches.

Additionally, access of by-vector ANN calls is now raising an exception if the store is set to use server-side embeddings (i.e. Vectorize).

Attached is the new flow of the chain of calls around the similarity-search methods.

forest3_ss_planned

@hemidactylus
Copy link
Collaborator Author

Note: the two top methods, ss(Q) and ss_byV(V), pay the undeserved penalty of having the API return similarity scores that are then discarded.

I think this is a negligible price given the response times of ANN queries and can be left as is in the name of simplicity. I might be wrong, though. (still, I'll go with this plan for now).

"AstraDBChatMessageHistory",
"AstraDBLoader",
"AstraDBVectorStore",
"AstraDBVectorStoreDocumentEncoder",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to make it public ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my reasoning for this choice:

  • the two sub classes, precisely made for the "native with vectorize" and "native without" two cases, are private and used onlhy internally.
  • so will the (probably)two classes for autodetect (with/out vectorize, same thing)
  • but probably "advanced" users who have filled their docs in a very peculiar way might want to design their Encoder class .. and to do so they will naturally subclass the abstract one ==> which then should remain public.

How does that sound?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am making _AstraDBVectorStoreDocumentEncoder private for now, since the possible exposure of user-provided encoders might or might not be on the roadmap and there'll always be time to make the base class private at that time.

@hemidactylus hemidactylus merged commit 8c44f10 into main Aug 28, 2024
@hemidactylus hemidactylus deleted the SL-simplify-ssearch-methods branch August 28, 2024 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants