Skip to content

Commit e900d9c

Browse files
AIGCoolDavdGao
andauthored
feat: Add support for milvus lite vector database in the RAG module of AgentScope (#825)
--------- Co-authored-by: DavdGao <[email protected]>
1 parent 176d53b commit e900d9c

File tree

7 files changed

+786
-0
lines changed

7 files changed

+786
-0
lines changed
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# MilvusLite Vector Store
2+
3+
This example demonstrates how to use **MilvusLiteStore** for vector storage and semantic search in AgentScope.
4+
It includes four test scenarios covering CRUD operations, metadata filtering, document chunking, and distance metrics.
5+
6+
### Quick Start
7+
8+
Install agentscope first, and then the MilvusLite dependency:
9+
10+
```bash
11+
# In MacOS/Linux
12+
pip install pymilvus\[milvus_lite\]
13+
14+
# In Windows
15+
pip install pymilvus[milvus_lite]
16+
```
17+
18+
Run the example script, which showcases adding, searching with/without filters in MilvusLite vector store:
19+
20+
```bash
21+
python milvuslite_store.py
22+
```
23+
24+
> **Note:** The script creates `.db` files in the current directory. You can delete them after testing.
25+
26+
## Usage
27+
28+
### Initialize Store
29+
```python
30+
from agentscope.rag import MilvusLiteStore
31+
32+
store = MilvusLiteStore(
33+
uri="./milvus_test.db",
34+
collection_name="test_collection",
35+
dimensions=768, # Match your embedding model
36+
distance="COSINE", # COSINE, L2, or IP
37+
)
38+
```
39+
40+
### Add Documents
41+
42+
```python
43+
from agentscope.rag import Document, DocMetadata
44+
from agentscope.message import TextBlock
45+
46+
doc = Document(
47+
metadata=DocMetadata(
48+
content=TextBlock(type="text", text="Your document text"),
49+
doc_id="doc_1",
50+
chunk_id=0,
51+
total_chunks=1,
52+
),
53+
embedding=[0.1, 0.2, ...], # Your embedding vector
54+
)
55+
56+
await store.add([doc])
57+
```
58+
59+
### Search
60+
61+
```python
62+
results = await store.search(
63+
query_embedding=[0.15, 0.25, ...],
64+
limit=5,
65+
score_threshold=0.9, # Optional
66+
filter='doc_id like "prefix%"', # Optional
67+
)
68+
```
69+
70+
### Delete
71+
72+
```python
73+
await store.delete(filter_expr='doc_id == "doc_1"')
74+
```
75+
76+
## Distance Metrics
77+
78+
| Metric | Description | Best For |
79+
|--------|-------------|----------|
80+
| **COSINE** | Cosine similarity | Text embeddings (recommended) |
81+
| **L2** | Euclidean distance | Spatial data |
82+
| **IP** | Inner Product | Recommendation systems |
83+
84+
## Filter Expressions
85+
86+
```python
87+
# Exact match
88+
filter='doc_id == "doc_1"'
89+
90+
# Pattern matching
91+
filter='doc_id like "prefix%"'
92+
93+
# Numeric and logical operators
94+
filter='chunk_id >= 0 and total_chunks > 1'
95+
```
96+
97+
## Advanced Usage
98+
99+
### Access Underlying Client
100+
```python
101+
client = store.get_client()
102+
stats = client.get_collection_stats(collection_name="test_collection")
103+
```
104+
105+
### Document Metadata
106+
- `content`: Text content (TextBlock)
107+
- `doc_id`: Unique document identifier
108+
- `chunk_id`: Chunk position (0-indexed)
109+
- `total_chunks`: Total chunks in document
110+
111+
## FAQ
112+
113+
**What embedding dimension should I use?**
114+
Match your embedding model's output dimension (e.g., 768 for BERT, 1536 for OpenAI ada-002).
115+
116+
**Can I change the distance metric after creation?**
117+
No, create a new collection with the desired metric.
118+
119+
**How do I delete the database?**
120+
Delete the `.db` file specified in the `uri` parameter.
121+
122+
**Is this suitable for production?**
123+
MilvusLite works well for development and small-scale applications. For production at scale, consider Milvus standalone or cluster mode.
124+
125+
## References
126+
127+
- [Milvus Documentation](https://milvus.io/docs)
128+
- [AgentScope RAG Tutorial](https://doc.agentscope.io/tutorial/task_rag.html)

0 commit comments

Comments
 (0)