-
Notifications
You must be signed in to change notification settings - Fork 12
Vector store, autodetect mode #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
libs/astradb/langchain_astradb/utils/vector_store_autodetect.py
Outdated
Show resolved
Hide resolved
…everted docstring on setup_mode
Vector Store "autodetect mode", preliminary guideSummaryThe newly-introduced "autodetect" mode for the Astra DB Vector Store assumes Upon initialization, the vector store class figures out the collection and Tested scenarios:All the following have been tested to work with this init mode.
Note that the vector stores created through LangFlow fall in the first two cases, Usage tipsIf you anticipate the collection to be populated by ingestion pipelines other than Note: autodetect will fall back to "native" mode, also for back-compatibility, in these Typical usageYou can initialize the store with the store = AstraDBVectorStore(
collection_name="my_collection",
token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
autodetect_collection=True,
)Note that most other init parameters can also be passed (see below for details). Additional parameters:
(Note: both parameters can also be used outside of the autodetect mode -- except for Forbidden parameters:The following parameters, pertaining to how the collection should be created, are not permitted Tested scenarios, detailsHere is a minimal script to try a basic similarity search with an autodetected store: import os
import logging
logging.basicConfig(level=5)
from langchain_astradb import AstraDBVectorStore
ad_store = AstraDBVectorStore(
collection_name=os.environ["AUTODETECT_COLLECTION_NAME"],
token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
autodetect_collection=True,
)
for doc, sco in ad_store.similarity_search_with_score(
"Tell me so and so",
k=2,
):
print(f"\n\n{'=' * 80}")
print(f"Score = {sco}")
print(f"Page content = {doc.page_content}")The following outlines the preparation steps preliminary to running the above "native" non-vectorize storeSimply use a regular vector store with client-side embeddings and have it Afterwards, try the autodetect script above on the same collection (but see the following Note). Note 1: in this case, you need to create the correct embedding, e.g.: from langchain_openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings(
api_key=os.environ["OPENAI_API_KEY"],
model="text-embedding-ada-002",
)and then pass it as an additional argument ( (Note 2: in this case, "native" vectorize-based storeUse a regular vectorize-based vector store and have it Afterwards, try the autodetect script above on the same collection. non-vectorize with imported CSVYou can create a Collection in the Astra UI with "Bring your own embedding", setting An example CSV is as follows (requires a dimension of 1536): Once the collection is created and populated, try the autodetect script given above, but check the following Note. Note 1: in this case, you need to create the correct embedding, e.g.: from langchain_openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings(
api_key=os.environ["OPENAI_API_KEY"],
model="text-embedding-ada-002",
)and then pass it as an additional argument ( Note 2: if you leave vectorize collection with imported CSV, "$vectorize" columnCreate an empty collection with vectorize and upload a CSV with a column explicitly named Then you can simply run the autodetect mini script above as is. vectorize collection with imported CSV, marked columnCreate an empty collection with vectorize and upload a CSV with no In this example it would be Then you can simply run the autodetect mini script above as is. vectorize collection with PDFCreate an empty collection with vectorize and choose the PDF format in the Upload wizard. Then you can simply run the autodetect mini script above as is. |
|
LGTM, great work! |
| `metadata_indexing_exclude` | ||
| `collection_indexing_policy` | ||
| `collection_vector_service_options` | ||
| `metric`, `setup_mode`, `metadata_indexing_include`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in rst, highlighting a word is done with 2 back-ticks.
Nit as it was already with single ticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info!
I am making all backticks in vectorstore docstring into double backticks right now.
| Note: | ||
| For concurrency in synchronous :meth:`~add_texts`:, as a rule of thumb, on a | ||
| typical client machine it is suggested to keep the quantity | ||
| For concurrency in synchronous :meth:``~add_texts``:, as a rule of thumb, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it should remain single tick 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ouch! ahahah. Fixing again :)
| (specifically, how often a write is an update of an existing id). | ||
| Remember you can pass concurrency settings to individual calls to | ||
| :meth:`~add_texts` and :meth:`~add_documents` as well. | ||
| :meth:``~add_texts`` and :meth:``~add_documents`` as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep single ticks
| Completely delete the collection from the database (as opposed | ||
| to :meth:`~clear`, which empties it only). | ||
| to :meth:``~clear``, which empties it only). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single tick
| Completely delete the collection from the database (as opposed | ||
| to :meth:`~aclear`, which empties it only). | ||
| to :meth:``~aclear``, which empties it only). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single tick
| **kwargs: you can pass any argument that you would | ||
| to :meth:`~add_texts` and/or to the 'AstraDBVectorStore' constructor | ||
| (see these methods for details). These arguments will be | ||
| to :meth:``~add_texts`` and/or to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single tick
| to :meth:`~aadd_texts` and/or to the 'AstraDBVectorStore' constructor | ||
| (see these methods for details). These arguments will be | ||
| routed to the respective methods as they are. | ||
| to :meth:``~aadd_texts`` and/or to the ``AstraDBVectorStore`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single tick for aadd_texts
| """Create an Astra DB vectorstore from a document list. | ||
| Utility method that defers to 'from_texts' (see that one). | ||
| Utility method that defers to ``from_texts`` (see that one). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: could use :meth: here for better reference.
| Args: see 'from_texts', except here you have to supply 'documents' | ||
| in place of 'texts' and 'metadatas'. | ||
| Args: see ``from_texts``, except here you have to supply 'documents' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR: incorrect syntax for Args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this while I'm at it ...
I will add some explanations these days (it's too late now, sorry).