Skip to content

GreenEarthX/Hydro_RAG

Repository files navigation

title emoji colorFrom colorTo sdk pinned
Hydro Rag
🐢
indigo
green
docker
false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Hydrogen Certification RAG System

A smart question-answering system for hydrogen certification documents using RAG (Retrieval-Augmented Generation).

What it does

  • Ask questions about hydrogen certifications in plain English
  • Get accurate answers from official certification documents
  • Compare different chunking strategies (basic vs hybrid)
  • Automatic certification detection from your questions

Quick Start

1. Setup Environment

# Clone the repository
git clone https://github.com/Goodnight77/hydro_rag
cd hydro_rag

# Create environment file
cp .env.example .env

2. Add API Keys

Edit .env file and add:

OPENAI_API_KEY=your_openai_key_here
GROQ_API_KEY=your_groq_key_here
ELASTICSEARCH_HOSTS=http://localhost:9200

3. Run with Docker

# Build and run
docker build -t hydrogen-rag .
docker run -p 7860:7860 hydrogen-rag

4. Use the App

Open http://localhost:7860 in your browser and start asking questions!

Example Questions

  • "What are the purity requirements in GH2 Standard?"
  • "How does CertifHy certification work?"
  • "What are the safety protocols for hydrogen storage?"

Available Certifications

  • GH2 Standard
  • CertifHy (NGC & RFNBO)
  • ISO 19880 Hydrogen Quality
  • ISCC EU/PLUS/CORSIA
  • REDcert-EU
  • TUV Rheinland H2.21
  • And more...

How it Works

  1. Document Processing: Converts PDFs, DOCX, and XLSX files into searchable chunks
  2. Smart Chunking: Uses both basic and semantic chunking strategies
  3. Question Classification: Automatically detects which certification you're asking about
  4. Hybrid Search: Combines text matching and vector similarity for better results
  5. AI-Powered Answers: Uses LLMs to generate clear, accurate responses

Tech Stack

  • Frontend: Streamlit
  • Backend: FastAPI
  • Search: Elasticsearch
  • LLMs: Groq (Llama 3.3, Gemma2)
  • Embeddings: OpenAI text-embedding-3-small
  • Documents: PDF, DOCX, XLSX support

Development

Local Setup

# Install dependencies
pip install -r requirements.txt

# Start Elasticsearch
# (See Docker setup or install locally)

# Run the app
streamlit run streamlit.py

API Usage

# Start FastAPI server
uvicorn app:app --host 0.0.0.0 --port 8000

# Query the API
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"query": "What are GH2 purity requirements?"}'

Project Structure

├── app.py              # FastAPI backend
├── streamlit.py        # Streamlit frontend
├── chunking/           # Text chunking strategies
├── elastic/            # Elasticsearch setup
├── embeddings/         # OpenAI embeddings
├── prompting/          # LLM query processing
├── file_processing.py  # Document parsing
└── Dockerfile          # Container setup

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

Need help? Open an issue or check the documentation in each module.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published