Feature discussion: Lexical and hybrid search with Milvus 2.5

I'd like to start a discussion on how we could add the new lexical search from Milvus 2.5 (for pure lexical search, or hybrid search). This would mean that the text is stored directly in the vector database.

@doberst Before I start on a PR could we please reach an alignment on the design?

Here's how I would envisage it working in practice:

Lexical search:
```python
LLMWareConfig().set_active_db("milvus")
MilvusConfig().set_config("host", "localhost", "port", 19530)

...

parsing_output = library.add_files(ingestion_folder_path)
query_results = Query(library).text_query(test_query, result_count=10)
```

and for hybrid search:
```python
LLMWareConfig().set_active_db("milvus")
MilvusConfig().set_config("host", "localhost", "port", 19530)

...

embedding_model = "mini-lm-sbert"
library.add_files(ingestion_folder_path)
library.install_new_embedding(embedding_model_name=embedding_model, vector_db=vector_db, batch_size=100)

query_results = Query(library).hybrid_query(sample_query, result_count=20) # is dual_pass_query equivalent to hybrid search? doesn't seem to be documented
```
I think `LLMWareConfig().set_active_vector_db("milvus")` when Milvus is the lexical database should give a warning that it's unnecessary but not throw an exception.

Also, `library.add_files()` should do the chunking, but no data is inserted into the database until you call `library.install_new_embedding` or `Query(library).text_query` since you need the full schema and want to insert the text and embedding simultaneously (you can add fields dynamically in Milvus but its very inefficient, and even more inefficient to update entities).

Do you think it would be an improved design if the function of `install_new_embedding` was actually done in `add_files` and the embedding model is configured to be part of `MilvusConfig().set_config`?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature discussion: Lexical and hybrid search with Milvus 2.5 #1157

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature discussion: Lexical and hybrid search with Milvus 2.5 #1157

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions