Skip to content

abonvalle/AIMO3-Kaggle

Repository files navigation

AI Mathematical Olympiad (AIMO) Progress Prize - Solution #3

A machine learning solution for the Kaggle AI Mathematical Olympiad (AIMO) competition, focused on solving complex mathematical problems using AI.

πŸ“‹ Project Overview

This repository contains a submission for the AIMO Progress Prize competition on Kaggle. The solution implements mathematical problem-solving capabilities with LaTeX processing and symbolic computation.

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • macOS, Linux, or Windows

Installation

  1. Clone this repository:
git clone <repository-url>
cd ai-mathematical-olympiad-progress-prize-3
  1. Create a virtual environment:
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. For Mac M-series (Metal GPU acceleration) - Install llama-cpp-python with Metal support:
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python --force-reinstall --no-cache-dir
  1. Create a .env file with your HuggingFace token (optional, for gated models):
echo "HF_TOKEN=your_token_here" > .env

Model Configuration

The project supports three backends: GGUF (quantized), Transformers (full precision), and Remote (HTTP inference).

Option 1: Quantized GGUF (Recommended for 16GB RAM)

This is the default configuration. Uses ~4.4GB RAM and runs on Mac M-series via Metal.

In src/config.py:

MODEL_NAME = "bartowski/NuminaMath-7B-TIR-GGUF"
MODEL_FILE = "NuminaMath-7B-TIR-Q4_K_M.gguf"
MODEL_BACKEND = "gguf"

Option 2: Full Precision Transformers (Requires ~14GB+ VRAM)

For systems with dedicated GPU and sufficient VRAM.

In src/config.py:

MODEL_NAME = "AI-MO/NuminaMath-7B-TIR"
MODEL_BACKEND = "transformers"

Option 3: Remote Model (Scaleway hosted / vLLM / OpenAI-compatible)

Use this if you host the model remotely (e.g., Scaleway) and expose an OpenAI-compatible API.

Set environment variables (recommended via .env):

MODEL_BACKEND=remote
REMOTE_BASE_URL=https://<your-host>/v1
REMOTE_MODEL=<served-model-name>
REMOTE_API_KEY=<optional>
# Optional: defaults shown
REMOTE_API_STYLE=chat           # or completions
REMOTE_TIMEOUT=120
REMOTE_VERIFY_SSL=true

Notes:

  • For REMOTE_API_STYLE=chat, the code calls POST {REMOTE_BASE_URL}/chat/completions.
  • For REMOTE_API_STYLE=completions, it calls POST {REMOTE_BASE_URL}/completions.

πŸͺ΅ Logging

The project uses Python logging for progress/debug output.

  • Logs print to stdout.
  • Logs also write to logs/app.log by default.

Environment variables:

LOG_LEVEL=INFO
LOG_TO_FILE=true
LOG_DIR=logs
LOG_FILE=app.log

When running local evaluation with save_traces=True, each per-problem trace JSON also includes an info_logs field containing captured INFO logs for that solve.

πŸ“š Training Data Export

When running local evaluation, correct solves are also exported as JSONL under:

  • training_data/correct_attempts.jsonl

Each JSONL line contains the problem text, expected/predicted answers, and a subset of attempts that produced the correct final answer (preferably code-backed or boxed), for later fine-tuning.

Notes:

  • This export can be enabled even when save_traces=False (it will still run the trace-enabled solver to collect attempt texts, but won’t write per-problem trace JSON files).

πŸ“¦ Dependencies

  • numpy - Numerical computing
  • pandas - Data manipulation and analysis
  • sympy - Symbolic mathematics
  • scikit-learn - Machine learning utilities
  • matplotlib - Data visualization
  • jupyter - Interactive notebooks
  • requests - HTTP library for API calls
  • transformers - HuggingFace model loading
  • llama-cpp-python - GGUF quantized model support
  • huggingface_hub - Model downloading

πŸ“ Project Structure

.
β”œβ”€β”€ app.py                          # Main entry point
β”œβ”€β”€ requirements.txt                # Python dependencies
β”œβ”€β”€ reference.csv                   # Reference data for problems
β”œβ”€β”€ test.csv                        # Test dataset
β”œβ”€β”€ sample_submission.csv           # Sample submission format
β”œβ”€β”€ src/                            # Source code modules
β”‚   β”œβ”€β”€ __init__.py                 # Package exports
β”‚   β”œβ”€β”€ config.py                   # Configuration settings
β”‚   β”œβ”€β”€ model.py                    # Model loading (GGUF & Transformers)
β”‚   β”œβ”€β”€ solver.py                   # Core problem solver
β”‚   β”œβ”€β”€ prompt.py                   # Prompt templates
β”‚   β”œβ”€β”€ answer.py                   # Answer extraction utilities
β”‚   β”œβ”€β”€ voting.py                   # Majority voting for self-consistency
β”‚   β”œβ”€β”€ code_executor.py            # Python code execution (TIR)
β”‚   β”œβ”€β”€ latex.py                    # LaTeX processing
β”‚   β”œβ”€β”€ data.py                     # Data loading utilities
β”‚   β”œβ”€β”€ logger.py                   # Logging configuration
β”‚   β”œβ”€β”€ validation.py               # Answer validation and constraints
β”‚   β”œβ”€β”€ classification.py           # Problem type classification
β”‚   └── evaluation.py               # Local testing utilities
β”œβ”€β”€ kaggle_evaluation/              # Kaggle evaluation module
β”‚   β”œβ”€β”€ aimo_3_gateway.py
β”‚   β”œβ”€β”€ aimo_3_inference_server.py
β”‚   └── core/                       # Core evaluation logic
└── README.md                       # This file

πŸ”§ Core Modules

src/solver.py

Main solver with multi-attempt weighted voting:

  • solve_math_problem(problem) - Solves a problem with classification, validation, and weighted voting
  • solve_single_attempt(problem) - Single solution attempt with code execution and metadata

src/model.py

Model loading supporting two backends:

  • GGUF - Quantized models via llama-cpp-python (recommended for 16GB RAM)
  • Transformers - Full precision HuggingFace models

src/code_executor.py

Tool-Integrated Reasoning (TIR) support:

  • Extracts Python code blocks from model output
  • Safely executes code with SymPy for verification
  • Returns computed answers from code execution

src/prompt.py

Prompt templates optimized for math olympiad:

  • build_prompt_tir() - Optimized for NuminaMath-TIR models with optional hints
  • build_prompt_classified() - Type-specific prompts based on classification
  • build_prompt() - Standard math solving prompt
  • build_prompt_with_cot() - Chain-of-thought prompt

src/voting.py

Self-consistency via majority and weighted voting:

  • majority_vote() - Simple majority voting
  • weighted_majority_vote() - Confidence-based weighted voting:
    • Code execution success (+0.5 weight)
    • Code and text answer agreement (+0.3)
    • Greedy decoding first attempt (+0.2)
    • Boxed answer found (+0.2)

src/classification.py

Problem type classification for specialized strategies:

  • Detects: number_theory, algebra, combinatorics, geometry, sequence, probability
  • classify_problem() - Returns problem type
  • get_problem_hint() - Returns specialized hints for prompts
  • get_difficulty_estimate() - Estimates problem complexity

src/validation.py

Answer validation and constraint checking:

  • validate_and_fix_answer() - Validates and fixes answers
  • extract_constraints() - Extracts modulo/range constraints from problems
  • check_answer_agreement() - Measures agreement across attempts

src/answer.py

Answer extraction with multiple strategies:

  • \boxed{} notation (highest priority)
  • "Final answer" statements
  • Last integer fallback

src/latex.py

LaTeX processing utilities:

  • clean_latex(text) - Normalizes mathematical expressions

πŸ“Š Data Files

  • reference.csv - Reference problems and solutions
  • test.csv - Test dataset for evaluation
  • sample_submission.csv - Template for submission format

πŸ† AIMO Competition

The AI Mathematical Olympiad (AIMO) challenges AI systems to solve challenging mathematical problems across multiple domains including:

  • Algebra
  • Geometry
  • Number Theory
  • Combinatorics
  • Analysis

πŸ”„ Workflow

  1. Data Preparation - Load and preprocess problem data from CSV files
  2. LaTeX Processing - Clean and normalize mathematical expressions
  3. Problem Solving - Apply mathematical algorithms and ML models
  4. Evaluation - Validate solutions against test cases
  5. Submission - Generate formatted submission file

πŸ“ Usage Example

from src import solve_math_problem, clean_latex

# Solve a math problem
problem = r"Find the sum of all positive integers $n$ such that $n^2 + 12n - 2007$ is a perfect square."
answer = solve_math_problem(problem)
print(f"Answer: {answer}")

# Clean LaTeX expressions
equation = r"$$\begin{equation}x^2 + y^2 = z^2\end{equation}$$"
cleaned = clean_latex(equation)
print(cleaned)  # Output: $$x^2 + y^2 = z^2$$

βš™οΈ How It Works

  1. Problem Input - Receives math problem in LaTeX format
  2. Problem Classification - Automatically classifies into type (algebra, number theory, etc.)
  3. Model Loading - Uses cached model for efficiency across multiple problems
  4. Prompt Building - Creates optimized prompt with type-specific hints
  5. Multi-Attempt Generation - Generates 3 solutions (configurable)
    • First attempt: Greedy decoding (deterministic, +0.2 weight)
    • Subsequent attempts: Temperature sampling (diverse)
  6. Code Execution - If model outputs Python code, executes it with SymPy (+0.5 weight)
    • Enhanced retry: Automatically fixes common errors and retries
  7. Answer Extraction - Extracts answer from \boxed{} (+0.2 weight) or code output
  8. Answer Validation - Validates against problem constraints (modulo, range)
  9. Iterative Refinement - If agreement < 50%, adds refinement attempts with feedback
  10. Weighted Voting - Returns highest-confidence answer using metadata weights

πŸ§ͺ Running the Solution

Local Testing

Test the solver on validation problems:

python app.py

This runs test_locally(num_samples=5) by default, which:

  1. Loads problems from reference.csv
  2. Splits into train/validation sets
  3. Tests on 5 validation problems
  4. Reports accuracy

Kaggle Submission

For actual competition submission, modify app.py:

if __name__ == "__main__":
    main()  # Instead of test_locally()

Then run:

python app.py

Configuration Options

Edit src/config.py to tune the solver:

# Number of solution attempts per problem (majority voting)
NUM_ATTEMPTS = 3

# Temperature for diverse sampling (0 = greedy, >0 = sampling)
TEMPERATURE = 0.7

# Maximum tokens to generate
MAX_NEW_TOKENS = 2048

# Enable/disable Python code execution
ENABLE_CODE_EXECUTION = True

# Code execution timeout (seconds)
CODE_TIMEOUT = 10

# Enhanced code retry with automatic error fixing
ENABLE_CODE_RETRY = True
CODE_MAX_RETRIES = 2

# Iterative refinement when answers disagree
ENABLE_ITERATIVE_REFINEMENT = True
REFINEMENT_THRESHOLD = 0.5  # Trigger if agreement below 50%
MAX_REFINEMENT_ATTEMPTS = 2

Caching

The model is automatically cached after the first problem to improve performance when solving multiple problems:

from src import get_cached_model_and_tokenizer, clear_model_cache

# Get cached model (loads once, reuses thereafter)
model, tokenizer = get_cached_model_and_tokenizer()

# Clear cache if needed (e.g., to free memory)
clear_model_cache()

πŸ“§ Contact & Support

For questions or issues regarding this solution, please refer to the competition guidelines on Kaggle.

πŸ“„ License

This project is submitted as part of the Kaggle AIMO Progress Prize competition.


Last Updated: December 2025

About

A machine learning solution for the Kaggle AI Mathematical Olympiad (AIMO) competition, focused on solving complex mathematical problems using AI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors