A Retrieval Augmented Generation (RAG) system for markdown documentation with intelligent rate limiting and MCP server integration.
- Semantic Search: Vector-based similarity search using Google Gemini embeddings
- Markdown-Aware Chunking: Intelligent document splitting that preserves semantic boundaries
- Rate Limiting: Sophisticated sliding window algorithm with token counting and batch optimization
- MCP Server: Model Context Protocol server for AI assistant integration
- PostgreSQL Vector Store: Scalable storage using pgvector extension
- Incremental Updates: Smart deduplication prevents reprocessing existing documents
- Production Ready: Type-safe configuration, comprehensive logging, and error handling
git clone https://github.com/yourusername/markdown-rag.git- Python 3.11+
- PostgreSQL 12+ with pgvector extension installed
- Google Gemini API key
- MCP-compatible client (Claude Desktop, Cline, etc.)
createdb embeddingsThe pgvector extension will be automatically enabled when you first run the tool.
cd markdown-rag
uv run markdown-rag /path/to/docs --command ingestRequired environment variables (create .env or export):
POSTGRES_PASSWORD=your_password
GOOGLE_API_KEY=your_gemini_api_keyAdd to your MCP client configuration (e.g., claude_desktop_config.json). The client will automatically start the server.
Minimal configuration:
{
"mcpServers": {
"markdown-rag": {
"command": "uv",
"args": [
"run",
"--directory"
"/absolute/path/to/markdown-rag",
"markdown-rag",
"/absolute/path/to/docs",
"--command",
"mcp"
],
"env": {
"POSTGRES_PASSWORD": "your_password",
"GOOGLE_API_KEY": "your_api_key"
}
}
}
}Full configuration:
{
"mcpServers": {
"markdown-rag": {
"command": "uv",
"args": [
"run",
"--directory"
"/absolute/path/to/markdown-rag",
"markdown-rag",
"/absolute/path/to/docs",
"--command",
"mcp"
],
"env": {
"POSTGRES_USER": "postgres_username",
"POSTGRES_PASSWORD": "your_password",
"POSTGRES_HOST": "postgres_url",
"POSTGRES_PORT": "1234", # Postgres connection URL port number
"POSTGRES_DB": "embeddings",
"GOOGLE_API_KEY": "your_api_key",
"RATE_LIMIT_REQUESTS_PER_MINUTE": "100",
"RATE_LIMIT_REQUESTS_PER_DAY": "1000",
"DISABLED_TOOLS": "delete_document,update_document"
}
}
}
}The server exposes several tools:
- Semantic search over documentation
- Arguments:
query(string),num_results(integer, optional, default: 4)
- List all ingested documents
- Arguments: none
- Remove a document from the index
- Arguments:
filename(string)
- Re-ingest a specific document
- Arguments:
filename(string)
- Scan directory and ingest new/modified files
- Arguments: none
To disable tools (e.g., in production), set DISABLED_TOOLS environment variable:
DISABLED_TOOLS=delete_document,update_document,refresh_index| Variable | Default | Required | Description |
|---|---|---|---|
POSTGRES_USER |
postgres |
No | PostgreSQL username |
POSTGRES_PASSWORD |
- | Yes | PostgreSQL password |
POSTGRES_HOST |
localhost |
No | PostgreSQL host |
POSTGRES_PORT |
5432 |
No | PostgreSQL port |
POSTGRES_DB |
embeddings |
No | Database name |
GOOGLE_API_KEY |
- | Yes | Google Gemini API key |
RATE_LIMIT_REQUESTS_PER_MINUTE |
100 |
No | Max API requests per minute |
RATE_LIMIT_REQUESTS_PER_DAY |
1000 |
No | Max API requests per day |
DISABLED_TOOLS |
- | No | Comma-separated list of tools to disable |
uv run markdown-rag <directory> [OPTIONS]Arguments:
<directory>: Path to markdown files directory (required)
Options:
-c, --command {ingest|mcp}: Operation mode (default:mcp)ingest: Process and store documentsmcp: Start MCP server for queries
-l, --level {debug|info|warning|error}: Logging level (default:warning)
Examples:
uv run markdown-rag ./docs --command ingest --level info
uv run markdown-rag /var/docs -c ingest -l debugThe following diagram shows how the system components interact:
graph TD
A[MCP Client<br/>Claude, ChatGPT, etc.] --> B[FastMCP Server<br/>Tool: query]
B --> C[MarkdownRAG]
C --> D[Text Splitters]
C --> E[Rate Limited Embeddings]
E --> F[Google Gemini<br/>Embeddings API]
C --> G[PostgreSQL<br/>+ pgvector]
The system implements a dual-window sliding algorithm:
- Request Limits: Tracks requests per minute and per day
- Token Limits: Counts tokens before API calls
- Batch Optimization: Calculates maximum safe batch sizes
- Smart Waiting: Minimal delays with automatic retry
See Architecture Documentation for detailed diagrams.
git clone https://github.com/yourusername/markdown-rag.git
cd markdown-rag
uv syncuv run ruff check .
uv run mypy .This project follows:
- Linting: Ruff with Google docstring convention
- Type Checking: mypy with strict settings
- Line Length: 79 characters
- Import Sorting: Alphabetical with isort
markdown-rag/
├── src/markdown_rag/
│ ├── __init__.py
│ ├── main.py # Entry point and MCP server
│ ├── config.py # Environment and CLI configuration
│ ├── models.py # Pydantic data models
│ ├── rag.py # Core RAG logic
│ ├── embeddings.py # Rate-limited embeddings wrapper
│ └── rate_limiter.py # Rate limiting algorithm
├── docs/
│ ├── api-reference.md # API documentation
│ ├── architecture.md # Architecture documentation
│ ├── mcp-integration.md # MCP server integration guide
│ └── user-guide.md # User guide
├── pyproject.toml # Project configuration
├── .env # Environment variables (not in git)
└── README.md
PostgreSQL not running or wrong connection settings. Check your connection parameters in environment variables.
Adjust rate limits in environment variables:
RATE_LIMIT_REQUESTS_PER_MINUTE=50
RATE_LIMIT_REQUESTS_PER_DAY=500The pgvector PostgreSQL extension is not installed. Follow the pgvector installation guide for your platform.
Expected behavior. The system prevents duplicate ingestion.
uv run markdown-rag ./docs --command ingest --level debug- Never commit
.envfiles - Add to.gitignore - Use environment variables for all secrets
- Restrict database access - Use firewall rules
- Rotate API keys regularly
- Use read-only database users for query-only deployments
All secrets use SecretStr type to prevent accidental logging:
from pydantic import SecretStr
api_key = SecretStr("secret_value")- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make changes and add tests
- Run linters (
uv run ruff check .) - Run type checks (
uv run mypy .) - Commit changes (
git commit -m 'feat: add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Follow conventional commits:
feat: add new feature
fix: resolve bug
docs: update documentation
refactor: improve code structure
test: add tests
chore: update dependencies
- Management of embeddings store via MCP tool.
- Add support for other embeddings models.
- Add support for other vector stores.
This project is licensed under the MIT License.
- LangChain - RAG framework
- Google Gemini - Embedding model
- pgvector - Vector similarity search
- FastMCP - MCP server framework
- Documentation: docs/architecture.md
- Issues: GitHub Issues
- Discussions: GitHub Discussions