- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3k
Description
I'd like to start a discussion on how we could add the new lexical search from Milvus 2.5 (for pure lexical search, or hybrid search). This would mean that the text is stored directly in the vector database.
@doberst Before I start on a PR could we please reach an alignment on the design?
Here's how I would envisage it working in practice:
Lexical search:
LLMWareConfig().set_active_db("milvus")
MilvusConfig().set_config("host", "localhost", "port", 19530)
...
parsing_output = library.add_files(ingestion_folder_path)
query_results = Query(library).text_query(test_query, result_count=10)and for hybrid search:
LLMWareConfig().set_active_db("milvus")
MilvusConfig().set_config("host", "localhost", "port", 19530)
...
embedding_model = "mini-lm-sbert"
library.add_files(ingestion_folder_path)
library.install_new_embedding(embedding_model_name=embedding_model, vector_db=vector_db, batch_size=100)
query_results = Query(library).hybrid_query(sample_query, result_count=20) # is dual_pass_query equivalent to hybrid search? doesn't seem to be documentedI think LLMWareConfig().set_active_vector_db("milvus") when Milvus is the lexical database should give a warning that it's unnecessary but not throw an exception.
Also, library.add_files() should do the chunking, but no data is inserted into the database until you call library.install_new_embedding or Query(library).text_query since you need the full schema and want to insert the text and embedding simultaneously (you can add fields dynamically in Milvus but its very inefficient, and even more inefficient to update entities).
Do you think it would be an improved design if the function of install_new_embedding was actually done in add_files and the embedding model is configured to be part of MilvusConfig().set_config?