Biometric RAG Agent

A Retrieval-Augmented Generation (RAG) system for question answering and document search over Suprema Biostar2 documentation and related biometric security resources.

Features

Document Parsing & Cleaning:
- Automated PDF parsing and cleaning using LlamaParse.
- Cleaned data stored in cleaned_data/ for efficient downstream processing.
Vector Database:
- Embedding generation with HuggingFace models (configurable).
- Vector storage and retrieval using ChromaDB.
RAG Pipeline:
- Modular pipeline for document retrieval, question answering, and answer grading.
- Supports both local and cloud LLMs (e.g., Gemini, Llama, OpenAI, HuggingFace).
API & Frontend:
- FastAPI backend with /query endpoint for chat and search.
- Streamlit-based frontend for interactive chatbot experience.
Evaluation:
- Integrated evaluation scripts and metrics for LLM output quality.
Logging & Observability:
- Langfuse integration for tracing and monitoring.

Project Structure

biometric-rag-agent/
├── cleaned_data/         # Cleaned text files from documentation
├── chroma_vector_db/     # Chroma vector database files
├── data/                 # Raw data (PDFs, etc.)
├── diagrams/             # System diagrams and images
├── evaluation/           # Evaluation scripts and metrics
├── frontend/             # Streamlit app and UI components
├── notebooks/            # Jupyter notebooks for prototyping
├── src/                  # Main source code (API, agent, data, vector_db, utils)
├── requirements.txt      # Python dependencies
├── Makefile              # Common commands
├── README.md             # Project documentation
└── ...

Setup

Install dependencies:

uv pip install -r requirements.txt
# or, for full project management:
uv pip install -r pyproject.toml

Set environment variables:
- Copy .env.example to .env and fill in required API keys (Google, Langfuse, etc).
Prepare data:
- Place raw PDFs in the data/ directory.
- Run the data cleaner:
```
python -m src.data_cleaner
```
Build vector indexes:
```
python -m src.index_builder
```
Start the API server:
```
python -m src.main
```
Run the frontend:
```
streamlit run frontend/app.py
```

Running the App

You can use the provided Makefile to run the full application pipeline. The recommended steps are:

Clean the data:
```
make clean-data
```
Build vector indexes:
```
make index
```
Run the backend API server:
```
make backend
```
Run the frontend:
```
make frontend
```

You can also chain these commands as needed for your workflow.

Usage

Access the chatbot UI via the Streamlit app.
Use the /query endpoint for programmatic access.
Evaluate model performance using scripts in the evaluation/ directory.

Environment Variables

GOOGLE_API_KEY - Google Gemini API key
DB_URL - Database connection string
LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST_URL - Langfuse observability
See .env.example for all options

Contributing

Pull requests and issues are welcome! Please ensure code is well-documented and tested.

License

This project includes code and documentation under various open-source licenses. See the cleaned_data/ directory for license details from upstream documentation sources.

For more information, see the system diagrams in diagrams/ and notebooks in notebooks/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biometric RAG Agent

Features

Project Structure

Setup

Running the App

Usage

Environment Variables

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.deepeval		.deepeval
chroma_vector_db		chroma_vector_db
cleaned_data		cleaned_data
data		data
diagrams		diagrams
evaluation		evaluation
frontend		frontend
notebooks		notebooks
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.http		.http
.python-version		.python-version
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
qa_critique.csv		qa_critique.csv
requirements.in		requirements.in
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Biometric RAG Agent

Features

Project Structure

Setup

Running the App

Usage

Environment Variables

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages