Turn Conversation into Clarity.
Clarity is an AI-powered meeting and lecture summarizer designed for students, professionals, and lifelong learners who need to quickly distill key insights from audio recordings. It transforms lengthy conversations into concise summaries, structured transcripts, and actionable items, solving the problem of information overload in a world of remote meetings and online classes. What sets it apart is its sophisticated, decoupled microservices architecture, built entirely on a zero-cost stack, making it a powerful tool and a portfolio-defining showcase of modern full-stack AI development.
- Quickstart
- Features
- Architecture & Design
- Tech Stack
- Repository Structure
- How It Works
- Tests & CI
- Performance & Outcomes
- Contribution & Development
- Roadmap & Known Limitations
- Credits & Acknowledgements
- License & Contact
Get the frontend running locally in a few steps.
-
Clone the repository:
git clone https://github.com/Genious07/ClarityO7.git cd clarityo7 -
Install dependencies:
npm install
-
Run the development server:
npm run dev
The application will be available at http://localhost:9002.
The primary user flow involves uploading an audio file through the web interface.
// src/app/page.tsx (Simplified)
"use client";
import { useState } from "react";
import FileUploadZone from "@/components/upload/file-upload-zone";
import ProcessingIndicator from "@/components/status/processing-indicator";
import ResultsDisplay from "@/components/results/results-display";
export default function Home() {
const [status, setStatus] = useState("idle");
const [results, setResults] = useState(null);
const handleFileUpload = async (file) => {
// ... logic to upload file and poll for results
};
const resetState = () => {
setStatus("idle");
setResults(null);
};
if (status === "processing") return <ProcessingIndicator />;
if (status === "success") return <ResultsDisplay results={results} onReset={resetState} />;
return <FileUploadZone onFileSelect={handleFileUpload} />;
}- ** AI-Powered Transcription:** Get accurate, readable transcripts from your audio files, powered by a state-of-the-art speech-to-text model.
- ** Intelligent Summarization:** Distill hours of conversation into a concise summary that captures the core ideas and key takeaways.
- ** Action Item Detection:** Automatically identify and extract actionable tasks and commitments from the transcript so you never miss a follow-up.
- ** Interactive & Animated UI:** A polished, premium user experience with smooth animations powered by GSAP that guide the user through the analysis process.
- ** Light & Dark Mode:** A beautiful, modern interface with full support for theme switching to match your preference.
- ** Decoupled & Scalable:** Built on a microservices architecture, ensuring each component is independent, maintainable, and can be scaled on its own.
Clarity is built on a decoupled, microservices-oriented architecture designed for scalability, maintainability, and resilience. This "Three-Tier Free-Tier" strategy leverages the strengths of different platforms to create a robust system.
- Upload: The user uploads an audio file via the Next.js Frontend.
- Orchestration: The frontend sends the file to the FastAPI Backend, which saves it, initiates the AI pipeline as a background task, and immediately returns a
task_id. - Polling: The frontend polls a
/status/{task_id}endpoint to show real-time progress. - AI Processing: The backend calls the three independent AI Microservices in sequence: first Speech-to-Text, then Summarization and Action Item Extraction in parallel.
- Retrieval: Once processing is complete, the backend stores the aggregated results. The frontend makes a final call to a
/result/{task_id}endpoint to fetch and display the data.
This asynchronous, polling-based approach ensures a non-blocking UI and a smooth user experience, even for long-running audio processing tasks.
The technology stack was chosen to prioritize performance, developer experience, and the ability to deploy on a zero-cost footprint.
| Tier | Technology | Rationale |
|---|---|---|
| Frontend | Next.js (React) | A production-grade framework for building fast, modern user interfaces with a great developer experience. |
| Tailwind CSS | A utility-first CSS framework that allows for rapid, custom UI development without leaving the HTML. | |
| GSAP | A professional-grade animation library used for creating a premium, interactive user experience. | |
| Backend | Python & FastAPI | Python provides access to a rich ML ecosystem, while FastAPI offers a high-performance, asynchronous framework for building robust APIs. |
| AI Models | Transformers | The core library for leveraging state-of-the-art open-source models for transcription, summarization, and classification. |
| Docker | Each AI microservice and the backend are containerized, ensuring consistency and portability across environments. |
The repository is organized into a monorepo-like structure with the frontend and backend services clearly separated.
/
├── FastAPI-Backend/
│ ├── Action Item Extraction Microservice/
│ │ ├── app.py # FastAPI app for action item classification
│ │ └── Dockerfile # Container definition
│ ├── Speech-to-Text Microservice/
│ │ ├── app.py # FastAPI app for audio transcription
│ │ └── Dockerfile
│ └── Summarization Microservice/
│ ├── app.py # FastAPI app for text summarization
│ └── Dockerfile
├── public/ # Static assets for the frontend
├── src/
│ ├── app/ # Next.js App Router pages and layouts
│ │ ├── layout.tsx # Root layout
│ │ └── page.tsx # Main application page
│ ├── components/ # Reusable React components
│ │ ├── layout/ # Header, Footer, etc.
│ │ ├── results/ # Components for displaying results
│ │ ├── status/ # Processing indicators
│ │ └── upload/ # File upload zone
│ ├── hooks/ # Custom React hooks (e.g., use-toast)
│ └── lib/ # Utility functions
├── package.json # Frontend dependencies and scripts
└── tailwind.config.ts # Tailwind CSS configuration
The core of the application is the AI pipeline orchestrated by the FastAPI backend. When an audio file is uploaded, it's first standardized and then sent to the transcription microservice.
The Speech-to-Text Microservice uses the distil-whisper/distil-large-v3 model, which is optimized for performance on CPU hardware. It implements a chunking strategy to handle long audio files, ensuring accurate transcription beyond the model's 30-second window.
# FastAPI-Backend/Speech-to-Text Microservice/app.py
# ... imports and model loading ...
@app.post("/transcribe")
async def transcribe_audio(audio_file: UploadFile = File(...)):
# ... error handling ...
# Read and resample audio to the required 16kHz mono format
audio_bytes = await audio_file.read()
speech, sr = librosa.load(io.BytesIO(audio_bytes), sr=16000, mono=True)
# The pipeline automatically handles chunking for long audio
result = asr_pipeline(speech, chunk_length_s=30, stride_length_s=5)
return {"transcription": result["text"]}Once the transcript is ready, it's sent to the Summarization and Action Item microservices, which use models fine-tuned for conversational data to produce the final results.
This project is set up with continuous integration using GitHub Actions.
- How to run tests:
npm test - Test coverage: We aim for >90% test coverage on critical components.
- CI Status: The build status badge at the top of this README reflects the latest status of the
mainbranch.
- 6x Faster Transcription: By using the distilled Whisper model, transcription speed is approximately 6 times faster than the original
large-v3model on CPU, with only a ~1% increase in Word Error Rate. - Zero-Cost Deployment: The entire application is architected to run on the free tiers of Vercel, Fly.io, and Hugging Face Spaces, demonstrating cost-effective engineering.
- Asynchronous Processing: The non-blocking, asynchronous backend ensures the UI remains responsive and provides real-time feedback, even during long processing jobs.
Contributions are welcome! Please follow the guidelines below.
- Branch Strategy: Create a new branch for each feature or bug fix (
feature/your-featureorfix/your-bug). - Development Workflow:
- Fork the repository.
- Create your feature branch.
- Commit your changes with clear, descriptive messages.
- Push to your branch and open a pull request against
main.
- Code Style: This project uses Prettier and ESLint for code formatting and linting. Please run
npm run lintbefore committing.
- Roadmap:
- Speaker diarization to identify and label different speakers in the transcript.
- Real-time transcription from microphone input.
- Integration with calendar and project management tools to export action items.
- Known Limitations:
- Transcription accuracy may vary depending on audio quality, background noise, and speaker accents.
- The application currently supports one audio file at a time.
This project was inspired by and built upon the excellent open-source work of the following communities and individuals:
- Hugging Face for their
transformerslibrary and the Spaces platform. - OpenAI for the original Whisper model.
- The creators of FastAPI, Next.js, and Tailwind CSS.
This project is licensed under the MIT License. See the LICENSE file for details.
Maintained by Satwik.
For questions, collaborations, or feedback, please reach out at [email protected].