AI Mathematical Olympiad (AIMO) Progress Prize - Solution #3

A machine learning solution for the Kaggle AI Mathematical Olympiad (AIMO) competition, focused on solving complex mathematical problems using AI.

📋 Project Overview

This repository contains a submission for the AIMO Progress Prize competition on Kaggle. The solution implements mathematical problem-solving capabilities with LaTeX processing and symbolic computation.

🚀 Quick Start

Prerequisites

Python 3.8+
macOS, Linux, or Windows

Installation

Clone this repository:

git clone <repository-url>
cd ai-mathematical-olympiad-progress-prize-3

Create a virtual environment:

python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

For Mac M-series (Metal GPU acceleration) - Install llama-cpp-python with Metal support:

CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python --force-reinstall --no-cache-dir

Create a .env file with your HuggingFace token (optional, for gated models):

echo "HF_TOKEN=your_token_here" > .env

Model Configuration

The project supports three backends: GGUF (quantized), Transformers (full precision), and Remote (HTTP inference).

Option 1: Quantized GGUF (Recommended for 16GB RAM)

This is the default configuration. Uses ~4.4GB RAM and runs on Mac M-series via Metal.

In src/config.py:

MODEL_NAME = "bartowski/NuminaMath-7B-TIR-GGUF"
MODEL_FILE = "NuminaMath-7B-TIR-Q4_K_M.gguf"
MODEL_BACKEND = "gguf"

Option 2: Full Precision Transformers (Requires ~14GB+ VRAM)

For systems with dedicated GPU and sufficient VRAM.

In src/config.py:

MODEL_NAME = "AI-MO/NuminaMath-7B-TIR"
MODEL_BACKEND = "transformers"

Option 3: Remote Model (Scaleway hosted / vLLM / OpenAI-compatible)

Use this if you host the model remotely (e.g., Scaleway) and expose an OpenAI-compatible API.

Set environment variables (recommended via .env):

MODEL_BACKEND=remote
REMOTE_BASE_URL=https://<your-host>/v1
REMOTE_MODEL=<served-model-name>
REMOTE_API_KEY=<optional>
# Optional: defaults shown
REMOTE_API_STYLE=chat           # or completions
REMOTE_TIMEOUT=120
REMOTE_VERIFY_SSL=true

Notes:

For REMOTE_API_STYLE=chat, the code calls POST {REMOTE_BASE_URL}/chat/completions.
For REMOTE_API_STYLE=completions, it calls POST {REMOTE_BASE_URL}/completions.

🪵 Logging

The project uses Python logging for progress/debug output.

Logs print to stdout.
Logs also write to logs/app.log by default.

Environment variables:

LOG_LEVEL=INFO
LOG_TO_FILE=true
LOG_DIR=logs
LOG_FILE=app.log

When running local evaluation with save_traces=True, each per-problem trace JSON also includes an info_logs field containing captured INFO logs for that solve.

📚 Training Data Export

When running local evaluation, correct solves are also exported as JSONL under:

training_data/correct_attempts.jsonl

Each JSONL line contains the problem text, expected/predicted answers, and a subset of attempts that produced the correct final answer (preferably code-backed or boxed), for later fine-tuning.

Notes:

This export can be enabled even when save_traces=False (it will still run the trace-enabled solver to collect attempt texts, but won’t write per-problem trace JSON files).

📦 Dependencies

numpy - Numerical computing
pandas - Data manipulation and analysis
sympy - Symbolic mathematics
scikit-learn - Machine learning utilities
matplotlib - Data visualization
jupyter - Interactive notebooks
requests - HTTP library for API calls
transformers - HuggingFace model loading
llama-cpp-python - GGUF quantized model support
huggingface_hub - Model downloading

📁 Project Structure

.
├── app.py                          # Main entry point
├── requirements.txt                # Python dependencies
├── reference.csv                   # Reference data for problems
├── test.csv                        # Test dataset
├── sample_submission.csv           # Sample submission format
├── src/                            # Source code modules
│   ├── __init__.py                 # Package exports
│   ├── config.py                   # Configuration settings
│   ├── model.py                    # Model loading (GGUF & Transformers)
│   ├── solver.py                   # Core problem solver
│   ├── prompt.py                   # Prompt templates
│   ├── answer.py                   # Answer extraction utilities
│   ├── voting.py                   # Majority voting for self-consistency
│   ├── code_executor.py            # Python code execution (TIR)
│   ├── latex.py                    # LaTeX processing
│   ├── data.py                     # Data loading utilities
│   ├── logger.py                   # Logging configuration
│   ├── validation.py               # Answer validation and constraints
│   ├── classification.py           # Problem type classification
│   └── evaluation.py               # Local testing utilities
├── kaggle_evaluation/              # Kaggle evaluation module
│   ├── aimo_3_gateway.py
│   ├── aimo_3_inference_server.py
│   └── core/                       # Core evaluation logic
└── README.md                       # This file

🔧 Core Modules

`src/solver.py`

Main solver with multi-attempt weighted voting:

solve_math_problem(problem) - Solves a problem with classification, validation, and weighted voting
solve_single_attempt(problem) - Single solution attempt with code execution and metadata

`src/model.py`

Model loading supporting two backends:

GGUF - Quantized models via llama-cpp-python (recommended for 16GB RAM)
Transformers - Full precision HuggingFace models

`src/code_executor.py`

Tool-Integrated Reasoning (TIR) support:

Extracts Python code blocks from model output
Safely executes code with SymPy for verification
Returns computed answers from code execution

`src/prompt.py`

Prompt templates optimized for math olympiad:

build_prompt_tir() - Optimized for NuminaMath-TIR models with optional hints
build_prompt_classified() - Type-specific prompts based on classification
build_prompt() - Standard math solving prompt
build_prompt_with_cot() - Chain-of-thought prompt

`src/voting.py`

Self-consistency via majority and weighted voting:

majority_vote() - Simple majority voting
weighted_majority_vote() - Confidence-based weighted voting:
- Code execution success (+0.5 weight)
- Code and text answer agreement (+0.3)
- Greedy decoding first attempt (+0.2)
- Boxed answer found (+0.2)

`src/classification.py`

Problem type classification for specialized strategies:

Detects: number_theory, algebra, combinatorics, geometry, sequence, probability
classify_problem() - Returns problem type
get_problem_hint() - Returns specialized hints for prompts
get_difficulty_estimate() - Estimates problem complexity

`src/validation.py`

Answer validation and constraint checking:

validate_and_fix_answer() - Validates and fixes answers
extract_constraints() - Extracts modulo/range constraints from problems
check_answer_agreement() - Measures agreement across attempts

`src/answer.py`

Answer extraction with multiple strategies:

\boxed{} notation (highest priority)
"Final answer" statements
Last integer fallback

`src/latex.py`

LaTeX processing utilities:

clean_latex(text) - Normalizes mathematical expressions

📊 Data Files

reference.csv - Reference problems and solutions
test.csv - Test dataset for evaluation
sample_submission.csv - Template for submission format

🏆 AIMO Competition

The AI Mathematical Olympiad (AIMO) challenges AI systems to solve challenging mathematical problems across multiple domains including:

Algebra
Geometry
Number Theory
Combinatorics
Analysis

🔄 Workflow

Data Preparation - Load and preprocess problem data from CSV files
LaTeX Processing - Clean and normalize mathematical expressions
Problem Solving - Apply mathematical algorithms and ML models
Evaluation - Validate solutions against test cases
Submission - Generate formatted submission file

📝 Usage Example

from src import solve_math_problem, clean_latex

# Solve a math problem
problem = r"Find the sum of all positive integers $n$ such that $n^2 + 12n - 2007$ is a perfect square."
answer = solve_math_problem(problem)
print(f"Answer: {answer}")

# Clean LaTeX expressions
equation = r"$$\begin{equation}x^2 + y^2 = z^2\end{equation}$$"
cleaned = clean_latex(equation)
print(cleaned)  # Output: $$x^2 + y^2 = z^2$$

⚙️ How It Works

Problem Input - Receives math problem in LaTeX format
Problem Classification - Automatically classifies into type (algebra, number theory, etc.)
Model Loading - Uses cached model for efficiency across multiple problems
Prompt Building - Creates optimized prompt with type-specific hints
Multi-Attempt Generation - Generates 3 solutions (configurable)
- First attempt: Greedy decoding (deterministic, +0.2 weight)
- Subsequent attempts: Temperature sampling (diverse)
Code Execution - If model outputs Python code, executes it with SymPy (+0.5 weight)
- Enhanced retry: Automatically fixes common errors and retries
Answer Extraction - Extracts answer from \boxed{} (+0.2 weight) or code output
Answer Validation - Validates against problem constraints (modulo, range)
Iterative Refinement - If agreement < 50%, adds refinement attempts with feedback
Weighted Voting - Returns highest-confidence answer using metadata weights

🧪 Running the Solution

Local Testing

Test the solver on validation problems:

python app.py

This runs test_locally(num_samples=5) by default, which:

Loads problems from reference.csv
Splits into train/validation sets
Tests on 5 validation problems
Reports accuracy

Kaggle Submission

For actual competition submission, modify app.py:

if __name__ == "__main__":
    main()  # Instead of test_locally()

Then run:

python app.py

Configuration Options

Edit src/config.py to tune the solver:

# Number of solution attempts per problem (majority voting)
NUM_ATTEMPTS = 3

# Temperature for diverse sampling (0 = greedy, >0 = sampling)
TEMPERATURE = 0.7

# Maximum tokens to generate
MAX_NEW_TOKENS = 2048

# Enable/disable Python code execution
ENABLE_CODE_EXECUTION = True

# Code execution timeout (seconds)
CODE_TIMEOUT = 10

# Enhanced code retry with automatic error fixing
ENABLE_CODE_RETRY = True
CODE_MAX_RETRIES = 2

# Iterative refinement when answers disagree
ENABLE_ITERATIVE_REFINEMENT = True
REFINEMENT_THRESHOLD = 0.5  # Trigger if agreement below 50%
MAX_REFINEMENT_ATTEMPTS = 2

Caching

The model is automatically cached after the first problem to improve performance when solving multiple problems:

from src import get_cached_model_and_tokenizer, clear_model_cache

# Get cached model (loads once, reuses thereafter)
model, tokenizer = get_cached_model_and_tokenizer()

# Clear cache if needed (e.g., to free memory)
clear_model_cache()

📧 Contact & Support

For questions or issues regarding this solution, please refer to the competition guidelines on Kaggle.

📄 License

This project is submitted as part of the Kaggle AIMO Progress Prize competition.

Last Updated: December 2025

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
kaggle_evaluation		kaggle_evaluation
src		src
training_data		training_data
.gitignore		.gitignore
AIMO3_Reference_Problems.pdf		AIMO3_Reference_Problems.pdf
README.md		README.md
app.py		app.py
notes.tex		notes.tex
reference.csv		reference.csv
requirements.txt		requirements.txt
sample_submission.csv		sample_submission.csv
test.csv		test.csv

Folders and files

Latest commit

History

Repository files navigation

AI Mathematical Olympiad (AIMO) Progress Prize - Solution #3

📋 Project Overview

🚀 Quick Start

Prerequisites

Installation

Model Configuration

Option 1: Quantized GGUF (Recommended for 16GB RAM)

Option 2: Full Precision Transformers (Requires ~14GB+ VRAM)

Option 3: Remote Model (Scaleway hosted / vLLM / OpenAI-compatible)

🪵 Logging

📚 Training Data Export

📦 Dependencies

📁 Project Structure

🔧 Core Modules

src/solver.py

src/model.py

src/code_executor.py

src/prompt.py

src/voting.py

src/classification.py

src/validation.py

src/answer.py

src/latex.py

📊 Data Files

🏆 AIMO Competition

🔄 Workflow

📝 Usage Example

⚙️ How It Works

🧪 Running the Solution

Local Testing

Kaggle Submission

Configuration Options

Caching

📧 Contact & Support

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`src/solver.py`

`src/model.py`

`src/code_executor.py`

`src/prompt.py`

`src/voting.py`

`src/classification.py`

`src/validation.py`

`src/answer.py`

`src/latex.py`

Packages