Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions examples/functionality/vector_store/milvus_lite/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# MilvusLite Vector Store

This example demonstrates how to use **MilvusLiteStore** for vector storage and semantic search in AgentScope.
It includes four test scenarios covering CRUD operations, metadata filtering, document chunking, and distance metrics.

### Quick Start

Install agentscope first, and then the MilvusLite dependency:

```bash
# In MacOS/Linux
pip install pymilvus\[milvus_lite\]

# In Windows
pip install pymilvus[milvus_lite]
```

Run the example script, which showcases adding, searching with/without filters in MilvusLite vector store:

```bash
python milvuslite_store.py
```

> **Note:** The script creates `.db` files in the current directory. You can delete them after testing.

## Usage

### Initialize Store
```python
from agentscope.rag import MilvusLiteStore

store = MilvusLiteStore(
uri="./milvus_test.db",
collection_name="test_collection",
dimensions=768, # Match your embedding model
distance="COSINE", # COSINE, L2, or IP
)
```

### Add Documents

```python
from agentscope.rag import Document, DocMetadata
from agentscope.message import TextBlock

doc = Document(
metadata=DocMetadata(
content=TextBlock(type="text", text="Your document text"),
doc_id="doc_1",
chunk_id=0,
total_chunks=1,
),
embedding=[0.1, 0.2, ...], # Your embedding vector
)

await store.add([doc])
```

### Search

```python
results = await store.search(
query_embedding=[0.15, 0.25, ...],
limit=5,
score_threshold=0.9, # Optional
filter='doc_id like "prefix%"', # Optional
)
```

### Delete

```python
await store.delete(filter_expr='doc_id == "doc_1"')
```

## Distance Metrics

| Metric | Description | Best For |
|--------|-------------|----------|
| **COSINE** | Cosine similarity | Text embeddings (recommended) |
| **L2** | Euclidean distance | Spatial data |
| **IP** | Inner Product | Recommendation systems |

## Filter Expressions

```python
# Exact match
filter='doc_id == "doc_1"'

# Pattern matching
filter='doc_id like "prefix%"'

# Numeric and logical operators
filter='chunk_id >= 0 and total_chunks > 1'
```

## Advanced Usage

### Access Underlying Client
```python
client = store.get_client()
stats = client.get_collection_stats(collection_name="test_collection")
```

### Document Metadata
- `content`: Text content (TextBlock)
- `doc_id`: Unique document identifier
- `chunk_id`: Chunk position (0-indexed)
- `total_chunks`: Total chunks in document

## FAQ

**What embedding dimension should I use?**
Match your embedding model's output dimension (e.g., 768 for BERT, 1536 for OpenAI ada-002).

**Can I change the distance metric after creation?**
No, create a new collection with the desired metric.

**How do I delete the database?**
Delete the `.db` file specified in the `uri` parameter.

**Is this suitable for production?**
MilvusLite works well for development and small-scale applications. For production at scale, consider Milvus standalone or cluster mode.

## References

- [Milvus Documentation](https://milvus.io/docs)
- [AgentScope RAG Tutorial](https://doc.agentscope.io/tutorial/task_rag.html)
Loading
Loading