|
| 1 | +# MilvusLite Vector Store |
| 2 | + |
| 3 | +This example demonstrates how to use **MilvusLiteStore** for vector storage and semantic search in AgentScope. |
| 4 | +It includes four test scenarios covering CRUD operations, metadata filtering, document chunking, and distance metrics. |
| 5 | + |
| 6 | +### Quick Start |
| 7 | + |
| 8 | +Install agentscope first, and then the MilvusLite dependency: |
| 9 | + |
| 10 | +```bash |
| 11 | +# In MacOS/Linux |
| 12 | +pip install pymilvus\[milvus_lite\] |
| 13 | + |
| 14 | +# In Windows |
| 15 | +pip install pymilvus[milvus_lite] |
| 16 | +``` |
| 17 | + |
| 18 | +Run the example script, which showcases adding, searching with/without filters in MilvusLite vector store: |
| 19 | + |
| 20 | +```bash |
| 21 | +python milvuslite_store.py |
| 22 | +``` |
| 23 | + |
| 24 | +> **Note:** The script creates `.db` files in the current directory. You can delete them after testing. |
| 25 | +
|
| 26 | +## Usage |
| 27 | + |
| 28 | +### Initialize Store |
| 29 | +```python |
| 30 | +from agentscope.rag import MilvusLiteStore |
| 31 | + |
| 32 | +store = MilvusLiteStore( |
| 33 | + uri="./milvus_test.db", |
| 34 | + collection_name="test_collection", |
| 35 | + dimensions=768, # Match your embedding model |
| 36 | + distance="COSINE", # COSINE, L2, or IP |
| 37 | +) |
| 38 | +``` |
| 39 | + |
| 40 | +### Add Documents |
| 41 | + |
| 42 | +```python |
| 43 | +from agentscope.rag import Document, DocMetadata |
| 44 | +from agentscope.message import TextBlock |
| 45 | + |
| 46 | +doc = Document( |
| 47 | + metadata=DocMetadata( |
| 48 | + content=TextBlock(type="text", text="Your document text"), |
| 49 | + doc_id="doc_1", |
| 50 | + chunk_id=0, |
| 51 | + total_chunks=1, |
| 52 | + ), |
| 53 | + embedding=[0.1, 0.2, ...], # Your embedding vector |
| 54 | +) |
| 55 | + |
| 56 | +await store.add([doc]) |
| 57 | +``` |
| 58 | + |
| 59 | +### Search |
| 60 | + |
| 61 | +```python |
| 62 | +results = await store.search( |
| 63 | + query_embedding=[0.15, 0.25, ...], |
| 64 | + limit=5, |
| 65 | + score_threshold=0.9, # Optional |
| 66 | + filter='doc_id like "prefix%"', # Optional |
| 67 | +) |
| 68 | +``` |
| 69 | + |
| 70 | +### Delete |
| 71 | + |
| 72 | +```python |
| 73 | +await store.delete(filter_expr='doc_id == "doc_1"') |
| 74 | +``` |
| 75 | + |
| 76 | +## Distance Metrics |
| 77 | + |
| 78 | +| Metric | Description | Best For | |
| 79 | +|--------|-------------|----------| |
| 80 | +| **COSINE** | Cosine similarity | Text embeddings (recommended) | |
| 81 | +| **L2** | Euclidean distance | Spatial data | |
| 82 | +| **IP** | Inner Product | Recommendation systems | |
| 83 | + |
| 84 | +## Filter Expressions |
| 85 | + |
| 86 | +```python |
| 87 | +# Exact match |
| 88 | +filter='doc_id == "doc_1"' |
| 89 | + |
| 90 | +# Pattern matching |
| 91 | +filter='doc_id like "prefix%"' |
| 92 | + |
| 93 | +# Numeric and logical operators |
| 94 | +filter='chunk_id >= 0 and total_chunks > 1' |
| 95 | +``` |
| 96 | + |
| 97 | +## Advanced Usage |
| 98 | + |
| 99 | +### Access Underlying Client |
| 100 | +```python |
| 101 | +client = store.get_client() |
| 102 | +stats = client.get_collection_stats(collection_name="test_collection") |
| 103 | +``` |
| 104 | + |
| 105 | +### Document Metadata |
| 106 | +- `content`: Text content (TextBlock) |
| 107 | +- `doc_id`: Unique document identifier |
| 108 | +- `chunk_id`: Chunk position (0-indexed) |
| 109 | +- `total_chunks`: Total chunks in document |
| 110 | + |
| 111 | +## FAQ |
| 112 | + |
| 113 | +**What embedding dimension should I use?** |
| 114 | +Match your embedding model's output dimension (e.g., 768 for BERT, 1536 for OpenAI ada-002). |
| 115 | + |
| 116 | +**Can I change the distance metric after creation?** |
| 117 | +No, create a new collection with the desired metric. |
| 118 | + |
| 119 | +**How do I delete the database?** |
| 120 | +Delete the `.db` file specified in the `uri` parameter. |
| 121 | + |
| 122 | +**Is this suitable for production?** |
| 123 | +MilvusLite works well for development and small-scale applications. For production at scale, consider Milvus standalone or cluster mode. |
| 124 | + |
| 125 | +## References |
| 126 | + |
| 127 | +- [Milvus Documentation](https://milvus.io/docs) |
| 128 | +- [AgentScope RAG Tutorial](https://doc.agentscope.io/tutorial/task_rag.html) |
0 commit comments