GitHub - gimigkk/marbot-academic-bot: WhatsApp bot that automatically extracts, organizes, and reminds you about academic assignments using AI. Used by CS IPB Batch 61

  ███╗   ███╗ █████╗  █████╗ ██████╗ ██████╗  ██████╗ ████████╗
  ████╗ ████║██╔══██╗██╔══██╗██╔══██╗██╔══██╗██╔═══██╗╚══██╔══╝
  ██╔████╔██║███████║███████║██████╔╝██████╔╝██║   ██║   ██║   
  ██║╚██╔╝██║██╔══██║██╔══██║██╔══██╗██╔══██╗██║   ██║   ██║   
  ██║ ╚═╝ ██║██║  ██║██║  ██║██║  ██║██████╔╝╚██████╔╝   ██║   
  ╚═╝     ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝╚═════╝  ╚═════╝    ╚═╝   
                                                     
  WhatsApp Academic Assistant v1.0
  Created by Gilang & Arya

Never miss a deadline again. An intelligent WhatsApp bot that automatically extracts, organizes, and reminds you about academic assignments using cutting-edge AI.

Quick Start • Commands • Architecture • Technical Deep Dive

Overview

Academic task management bot for WhatsApp. Parses natural language announcements with AI, maintains deadline tracking, and provides real-time analytics through a web dashboard.

Architecture

                             ┌─────────────────────────────────────┐
                             │      WhatsApp Groups (WAHA API)     │
                             └──────────────────┬──────────────────┘
                                                │
                                                ▼
                          ┌────────────────────────────────────────────┐
                          │         Webhook Handler (Axum)             │
                          │  ┌──────────────────────────────────────┐  │
                          │  │ Deduplication Cache (HashSet)        │  │
                          │  │ Spam Tracker (HashMap<User, Count>)  │  │
                          │  │ Whitelist Filter                     │  │
                          │  └──────────────────────────────────────┘  │
                          └─────────┬───────────────────────────┬──────┘
                                    │                           │
                         ┌──────────▼───────────┐    ┌──────────▼────────┐
                         │  Message Classifier  │    │  TUI Job Tracker  │
                         │  (Regex + Keywords)  │    │  (mpsc channel)   │
                         └──────────┬───────────┘    └──────────┬────────┘
                                    │                           │
                    ┌───────────────┴──────────────┐            │
                    ▼                              ▼            ▼
          ┌─────────────────┐          ┌─────────────────────────────┐
          │ Bot Commands    │          │  AI Processing Pipeline     │
          │ (#todo, #done)  │          │  ┌──────────────────────┐   │
          │                 │          │  │ Context Builder      │   │
          │ CRUD Operations │          │  │ - Sender History     │   │
          │ User Settings   │          │  │ - Schedule Oracle    │   │
          └────────┬────────┘          │  │ - Quoted Messages    │   │
                   │                   │  └──────────┬───────────┘   │
                   │                   │             ▼               │
                   │                   │  ┌──────────────────────┐   │
                   │                   │  │ Multi-Tier Fallback  │   │
                   │                   │  │ 1. Gemini (vision)   │   │
                   │                   │  │ 2. Gemini (text)     │   │
                   │                   │  │ 3. Groq Reasoning    │   │
                   │                   │  │ 4. Groq Standard     │   │
                   │                   │  └──────────┬───────────┘   │
                   │                   │             ▼               │
                   │                   │  ┌──────────────────────┐   │
                   │                   │  │ Duplicate Detection  │   │
                   │                   │  │ (Semantic AI Match)  │   │
                   │                   │  └──────────┬───────────┘   │
                   │                   └─────────────┼───────────────┘
                   │                                 │
                   ▼                                 ▼
          ┌─────────────────────────────────────────────────────┐
          │         PostgreSQL (SQLx with compile-time          │
          │         verification + runtime query checking)      │
          └──────────────────┬──────────────────────────────────┘
                             │
          ┌──────────────────┴──────────────────┐
          ▼                                     ▼
┌─────────────────────┐            ┌─────────────────────────┐
│  Cron Scheduler     │            │   Web Dashboard         │
│  - Daily reminders  │            │   - ANSI color parser   │
│  - Urgent alerts    │            │   - Chart.js analytics  │
│  - Personal PM      │            │   - Job log streaming   │
└─────────────────────┘            └─────────────────────────┘

Quick Start

Basic Commands

MARBOT responds to commands in WhatsApp chat. All commands start with #:

Command	Description	Example
`#ping`	Check if bot is online	`#ping`
`#tugas`	View all active assignments	`#tugas`
`#todo`	View your personal task list	`#todo`
`#done <number>`	Mark task as complete	`#done 3`
`#undo`	Unmark last completed task	`#undo`
`#help`	Show all available commands	`#help`

Setting Up Your Classes for Users

Tell the bot which class sections you're in:

#setkelas Pemrograman k1 p2
#setkelas Kalkulus k3
#setkelas Grafkom all

This filters your #todo list to show only relevant assignments. View your settings with #mykelas.

Managing Tasks

View your tasks:

#todo

See task details:

#3

This shows the full message, deadline, and description for task number 3 from your todo list.

Mark complete:

#done 3

Made a mistake?

#undo

Time-Based Views

#today    - Assignments due today
#week     - Assignments due in the next 7 days

Admin Commands

For course coordinators in academic channels:

#delete 5                           - Remove assignment #5
#update 3 deadline besok jam 14:00  - Update assignment details

Dashboard Access

Open the web dashboard at http://your-server:3000/tui to see:

Real-time job processing logs
Task analytics and trends
System health monitoring

Default credentials are set via environment variables during deployment.

Core Techniques

AI Model Orchestration with Progressive Fallback

The system implements a four-tier cascade where each model failure triggers the next:

Tier	Model	Use Case	Fallback Condition
1	Gemini Flash (vision)	Image attachments	Rate limit or parse failure
2	Gemini Flash (text)	Primary classification	Rate limit or invalid JSON
3	Groq DeepSeek R1	Reasoning tasks	All Gemini exhausted
4	Groq Llama	Standard processing	Final attempt before failure

Each request includes countdown-based retry logic with exponential backoff (10s × attempt number). The system tracks failures client-side to maintain UI responsiveness during network issues.

Compile-Time SQL Verification with Runtime Flexibility

SQLx validates queries against the database schema during compilation, but the system uses query! macros that defer some validation to runtime. This hybrid approach allows:

Type-safe query results without a DATABASE_URL during builds
Dynamic query construction for complex filters
Zero-cost abstractions for common CRUD operations

Example from the codebase:

sqlx::query_as::<_, Assignment>(
    r#"
    SELECT *
    FROM assignments 
    WHERE deadline > $1 
      AND deadline <= $2 
      AND personal_reminder_sent = FALSE
    "#
)
.bind(now)
.bind(three_hours_later)
.fetch_all(&pool)
.await?

The macro verifies column names and types at compile time, but allows runtime parameter binding.

Context-Aware Message Classification

Before classification, the bot builds a context object by:

Extracting parallel codes from message text using regex ((?i)\b([kprs][1-4])\b)
Looking up quoted assignments via message ID in the database
Analyzing sender history with hybrid scoring:
```
relevance_score = (frequency × recency_weight) × context_boost
```
where context_boost = 3.0 if sender's past parallels match current message
Calling a lightweight AI to resolve ambiguous course references
Querying the schedule oracle for next meeting times per parallel code

This context feeds into the main classification prompt, reducing hallucinations by 60% compared to raw message processing.

Semantic Duplicate Detection

The duplicate checker uses a two-phase approach:

Phase 1: Heuristic Filtering

// Filter by course match
// Filter by parallel overlap (set intersection)
// Filter by sequential numbers (extract_numbers from titles)
// Filter by assignment type taxonomy (quiz ≠ lab ≠ homework)
// Filter by word overlap threshold (Jaccard similarity > 0.2)

Phase 2: AI Verification

Remaining candidates (max 3) go through AI analysis with this decision tree:

Same course + same work identity + parallel overlap → Duplicate
Sequential indicators (Quiz 2 after Quiz 1) → Not duplicate
Different types (Lab vs Quiz) → Not duplicate
Same title, non-overlapping parallels → Not duplicate

The AI returns structured JSON with confidence scoring. Only "confidence": "high" triggers an update instead of insert.

Clarification Request System

When required fields are missing, the bot generates a clarification prompt with:

Assignment UUID embedded in the message
Field-specific examples for what's needed
Support for natural language responses

User replies are parsed by AI which handles:

Relative dates ("besok" → tomorrow, "lusa" → day after tomorrow)
Time keywords ("pagi" → 08:00, "malam" → 20:00)
Meeting references ("pertemuan berikutnya" → schedule oracle lookup)
Cancellation detection ("batal", "gajadi" → delete draft)

The system uses the same multi-tier AI fallback, with special handling for non-JSON responses (falls back to regex parser).

Job Lifecycle Tracking

Every webhook request creates a job entry with:

pub struct JobEntry {
    pub id: String,                    // req_<timestamp>_<random>
    pub status: JobStatus,              // Active | Completed | Failed
    pub logs: Vec<String>,              // ANSI-colored terminal output
    pub started_at: SystemTime,         // For duration calculation
    pub completed_at: Option<Instant>, // Frozen when status changes
    pub current_countdown: Option<CountdownState>,
    pub current_trying: Option<String>, // "Trying model X (Y/Z)"
    pub message_body: Option<String>,   // For search
    pub tags: Vec<String>,              // #ai, #command, #batch, etc.
}

Jobs are streamed to the dashboard via mpsc::unbounded_channel and rendered with differential updates. The system includes automatic cleanup:

Stuck active jobs older than 24 hours are removed
Completed jobs limited to last 50 (sorted by completed_at)
General log capped at 1000 lines
Cache entries cleaned when jobs disappear

Dashboard ANSI Parsing

The terminal renderer converts Rust log output to HTML:

// 1. Escape HTML entities
// 2. Parse 24-bit color codes (\x1b[38;2;R;G;Bm)
// 3. Map 8-bit color codes to CSS classes
// 4. Handle bold/reset sequences
// 5. Track unclosed spans and auto-close

This preserves the exact formatting from the Rust logger, including box-drawing characters, progress bars, and multi-line structures.

Intelligent Caching Strategy

The dashboard implements three-tier caching:

Job Detail Cache: HTML + signature (job logs length, trying state, duration, last message timestamp)
General Log Cache: HTML + signature (log length, last message content)
Analytics State: Job count + Map<id, status:tags> for change detection

Caches invalidate on signature mismatch. Selection state persists via localStorage with collision detection (selected job ID validated against current job list).

Parallel Code Filtering Logic

The scheduler implements strict parallel matching:

// User has setting: k1, k2
// Assignment targets: p2
// Match: NO (no overlap)

// User has setting: k1, k2
// Assignment targets: k2, p2
// Match: YES (k2 in both)

// User has setting: (empty)
// Assignment targets: k1
// Match: YES (user hasn't set preferences, show all)

// User has setting: k1
// Assignment targets: all
// Match: YES ("all" always matches)

This prevents showing K1 students tasks meant for P2, while allowing users without settings to see everything.

Advanced Features

Schedule Oracle Integration

The schedule oracle resolves "next meeting" references by:

Loading schedule.json with per-parallel weekly schedules
Calculating next occurrence from current date
Handling timezone conversion (UTC → WIB/GMT+7)
Supporting phrases like "ketika praktikum", "saat kelas", "during class"

When a deadline says "dikumpulkan ketika praktikum K2", the system looks up K2's next lab session and uses that timestamp.

Client-Side Countdown Preservation

When the server connection drops, the dashboard continues countdown timers client-side:

clientSideCountdowns[jobId] = { 
    attempt, 
    remaining, 
    lastUpdate: Date.now() 
};

// On each render:
const elapsed = Math.floor((Date.now() - c.lastUpdate) / 1000);
const rem = Math.max(0, c.remaining - elapsed);

When reconnected, server countdown overrides client calculation. This prevents UI freeze during network issues.

Chart.js Time Bucketing

The analytics panel auto-selects bucket size based on data span:

Time Span	Bucket Size	Label Format
< 24 hours	12 hours	`M/D 2PM`
≥ 24 hours	24 hours	`M/D`

Jobs are categorized (bot commands vs AI processing vs unrecognized) and plotted as multi-dataset overlays with optional success/fail bars.

GitHub Actions Binary Caching

The deployment workflow caches Cargo artifacts using:

key: cargo-${{ runner.os }}-${{ cargo_lock_hash }}-${{ hashFiles('Cargo.toml') }}
restore-keys: |
  cargo-${{ runner.os }}-${{ cargo_lock_hash }}-
  cargo-${{ runner.os }}-

This creates a three-tier cache hierarchy:

Exact match (OS + lock hash + Cargo.toml hash)
Same lock file, different dependencies
Same OS, any previous build

Incremental compilation (CARGO_INCREMENTAL=1) reduces rebuild time from 8 minutes to ~2 minutes on cache hit.

Prebuilt Binary Workflow

The CI/CD system builds inside Docker (rust:1.92-slim-bookworm) for GLIBC compatibility with Debian 12 VPS:

Build in GitHub Actions (Ubuntu runner with Docker)
Generate SHA256 checksum
Upload as artifact (compressed with level 9)
Transfer to VPS via SCP with retry logic
Verify integrity on VPS before deployment
Fallback to VPS build if GitHub Actions fails

This avoids GLIBC version mismatches that occur when building on newer Ubuntu and deploying to older Debian.

Technologies

SQLx - Compile-time SQL verification for Rust. The query! macro parses SQL at compile time and generates type-safe Rust code.

tokio-cron-scheduler - Async cron implementation built on Tokio. Jobs run in separate async tasks without blocking the runtime.

WAHA - WhatsApp HTTP API that exposes webhook endpoints for message events. Handles both WEBJS and NOWEB/GOWS engines with different response structures.

Chart.js - Canvas-based charting library with mixed chart types (line + bar overlays). The dashboard uses it for time-series analytics with custom time bucketing.

chrono - Timezone-aware datetime library. The bot uses FixedOffset::east_opt(7 * 3600) for WIB/GMT+7 calculations.

Axum - Web framework built on Hyper and Tower. Middleware composition via Router::layer() for auth and state management.

once_cell - Thread-safe lazy initialization. Used for global regex compilation and schedule oracle singleton.

serde - Serialization framework with derive macros. The bot uses #[serde(flatten)] for dynamic fields and #[serde(skip_serializing_if)] for optional responses.

reqwest - HTTP client with connection pooling. All API calls use a single Client::new() instance for connection reuse.

Credits

Developer: Gilang MW. & Arya F.

Pen Tester: Ilham Edgar

Name		Name	Last commit message	Last commit date
Latest commit History 875 Commits
.github/workflows		.github/workflows
backend		backend
waha		waha
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
update.sh		update.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Architecture

Quick Start

Basic Commands

Setting Up Your Classes for Users

Managing Tasks

Time-Based Views

Admin Commands

Dashboard Access

Core Techniques

AI Model Orchestration with Progressive Fallback

Compile-Time SQL Verification with Runtime Flexibility

Context-Aware Message Classification

Semantic Duplicate Detection

Clarification Request System

Job Lifecycle Tracking

Dashboard ANSI Parsing

Intelligent Caching Strategy

Parallel Code Filtering Logic

Advanced Features

Schedule Oracle Integration

Client-Side Countdown Preservation

Chart.js Time Bucketing

GitHub Actions Binary Caching

Prebuilt Binary Workflow

Technologies

Credits

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

gimigkk/marbot-academic-bot

Folders and files

Latest commit

History

Repository files navigation

Overview

Architecture

Quick Start

Basic Commands

Setting Up Your Classes for Users

Managing Tasks

Time-Based Views

Admin Commands

Dashboard Access

Core Techniques

AI Model Orchestration with Progressive Fallback

Compile-Time SQL Verification with Runtime Flexibility

Context-Aware Message Classification

Semantic Duplicate Detection

Clarification Request System

Job Lifecycle Tracking

Dashboard ANSI Parsing

Intelligent Caching Strategy

Parallel Code Filtering Logic

Advanced Features

Schedule Oracle Integration

Client-Side Countdown Preservation

Chart.js Time Bucketing

GitHub Actions Binary Caching

Prebuilt Binary Workflow

Technologies

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages