Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions libs/astradb/langchain_astradb/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from langchain_astradb.chat_message_histories import AstraDBChatMessageHistory
from langchain_astradb.document_loaders import AstraDBLoader
from langchain_astradb.storage import AstraDBByteStore, AstraDBStore
from langchain_astradb.utils.encoders import AstraDBVectorStoreDocumentEncoder
from langchain_astradb.vectorstores import AstraDBVectorStore

__all__ = [
Expand All @@ -16,5 +17,6 @@
"AstraDBChatMessageHistory",
"AstraDBLoader",
"AstraDBVectorStore",
"AstraDBVectorStoreDocumentEncoder",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to make it public ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my reasoning for this choice:

  • the two sub classes, precisely made for the "native with vectorize" and "native without" two cases, are private and used onlhy internally.
  • so will the (probably)two classes for autodetect (with/out vectorize, same thing)
  • but probably "advanced" users who have filled their docs in a very peculiar way might want to design their Encoder class .. and to do so they will naturally subclass the abstract one ==> which then should remain public.

How does that sound?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am making _AstraDBVectorStoreDocumentEncoder private for now, since the possible exposure of user-provided encoders might or might not be on the roadmap and there'll always be time to make the base class private at that time.

"CollectionVectorServiceOptions",
]
10 changes: 5 additions & 5 deletions libs/astradb/langchain_astradb/utils/encoders.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,16 +28,16 @@ def _default_encode_filter(filter_dict: dict[str, Any]) -> dict[str, Any]:
return metadata_filter


class VSDocumentEncoder(ABC):
class AstraDBVectorStoreDocumentEncoder(ABC):
"""A document encoder for the Astra DB vector store.

The document encoder contains the information for consistent interaction
with documents as stored on the Astra DB collection.

Implementations of this class must:
- define how to encode/decode documents consistently to and from
Astra DB collections. The two operations must combine to the identity
on both sides.
Astra DB collections. The two operations must, so to speak, combine
to the identity on both sides (except for the quirks of their signatures).
- provide the adequate projection dictionaries for running find
operations on Astra DB, with and without the field containing the vector.
- encode IDs to the `_id` field on Astra DB.
Expand Down Expand Up @@ -98,7 +98,7 @@ def encode_filter(self, filter_dict: dict[str, Any]) -> dict[str, Any]:
"""


class DefaultVSDocumentEncoder(VSDocumentEncoder):
class _DefaultVSDocumentEncoder(AstraDBVectorStoreDocumentEncoder):
"""Encoder for the default vector store usage with client-side embeddings.

This encoder expresses how document are stored for collections created
Expand Down Expand Up @@ -148,7 +148,7 @@ def encode_filter(self, filter_dict: dict[str, Any]) -> dict[str, Any]:
return _default_encode_filter(filter_dict)


class DefaultVectorizeVSDocumentEncoder(VSDocumentEncoder):
class _DefaultVectorizeVSDocumentEncoder(AstraDBVectorStoreDocumentEncoder):
"""Encoder for the default vector store usage with server-side embeddings.

This encoder expresses how document are stored for collections created
Expand Down
Loading