| title | emoji | colorFrom | colorTo | sdk | pinned |
|---|---|---|---|---|---|
Hydro Rag |
🐢 |
indigo |
green |
docker |
false |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
A smart question-answering system for hydrogen certification documents using RAG (Retrieval-Augmented Generation).
- Ask questions about hydrogen certifications in plain English
- Get accurate answers from official certification documents
- Compare different chunking strategies (basic vs hybrid)
- Automatic certification detection from your questions
# Clone the repository
git clone https://github.com/Goodnight77/hydro_rag
cd hydro_rag
# Create environment file
cp .env.example .envEdit .env file and add:
OPENAI_API_KEY=your_openai_key_here
GROQ_API_KEY=your_groq_key_here
ELASTICSEARCH_HOSTS=http://localhost:9200
# Build and run
docker build -t hydrogen-rag .
docker run -p 7860:7860 hydrogen-ragOpen http://localhost:7860 in your browser and start asking questions!
- "What are the purity requirements in GH2 Standard?"
- "How does CertifHy certification work?"
- "What are the safety protocols for hydrogen storage?"
- GH2 Standard
- CertifHy (NGC & RFNBO)
- ISO 19880 Hydrogen Quality
- ISCC EU/PLUS/CORSIA
- REDcert-EU
- TUV Rheinland H2.21
- And more...
- Document Processing: Converts PDFs, DOCX, and XLSX files into searchable chunks
- Smart Chunking: Uses both basic and semantic chunking strategies
- Question Classification: Automatically detects which certification you're asking about
- Hybrid Search: Combines text matching and vector similarity for better results
- AI-Powered Answers: Uses LLMs to generate clear, accurate responses
- Frontend: Streamlit
- Backend: FastAPI
- Search: Elasticsearch
- LLMs: Groq (Llama 3.3, Gemma2)
- Embeddings: OpenAI text-embedding-3-small
- Documents: PDF, DOCX, XLSX support
# Install dependencies
pip install -r requirements.txt
# Start Elasticsearch
# (See Docker setup or install locally)
# Run the app
streamlit run streamlit.py# Start FastAPI server
uvicorn app:app --host 0.0.0.0 --port 8000
# Query the API
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"query": "What are GH2 purity requirements?"}'├── app.py # FastAPI backend
├── streamlit.py # Streamlit frontend
├── chunking/ # Text chunking strategies
├── elastic/ # Elasticsearch setup
├── embeddings/ # OpenAI embeddings
├── prompting/ # LLM query processing
├── file_processing.py # Document parsing
└── Dockerfile # Container setup
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
Need help? Open an issue or check the documentation in each module.