A comprehensive hands-on learning path from classical machine learning through transformers to LLM fine-tuning.
This repository contains a complete, structured learning curriculum that takes you from zero to hero in machine learning and LLMs. Instead of treating large language models as mysterious black boxes, you'll build genuine understanding from first principles.
β You should use this if you want to:
- Understand how ML/LLMs actually work (not just use them)
- Build models from scratch before using libraries
- Prepare for AI safety research or ML engineering roles
- Learn through hands-on projects, not just theory
- Have a systematic path from basics to advanced topics
β This might not be for you if:
- You just want to use pre-built models (use Hugging Face instead)
- You're looking for a quick weekend tutorial
- You prefer video courses over hands-on coding
- You don't have 4-6 months for deep learning
Most ML courses either:
- π« Treat models as black boxes (just call APIs)
- π« Jump straight to deep learning (missing foundations)
- π« Focus on theory without implementation
- π« Use frameworks without understanding internals
This curriculum:
- β Implements everything from scratch first
- β Builds foundations before advanced topics
- β Balances theory with extensive coding
- β Teaches why before showing library shortcuts
By completing this journey, you'll deeply understand:
- π’ Gradient Descent: How optimization really works
- π Loss Functions: MSE, cross-entropy, and why they matter
- π³ Classical ML: Trees, SVMs, ensembles from scratch
- π€ Transformers: Self-attention, positional encoding, architecture
- π₯ Pretraining: What happens when models learn language
- π― Fine-tuning: LoRA and parameter-efficient methods
- π Evaluation: Proper metrics and experimental design
- Write ML algorithms from scratch (NumPy only)
- Build and train transformer models (PyTorch)
- Fine-tune production LLMs (MLX on Apple Silicon)
- Design rigorous experiments
- Debug models by understanding internals
- Systematic analysis methodology
- Hypothesis-driven experimentation
- Rigorous documentation practices
- Foundation for AI safety research
Goal: Master fundamental ML concepts before approaching deep learning
Projects 1-11: Core foundations
- Linear & Logistic Regression from scratch
- Multi-class classification with softmax
- Regularization and overfitting
- Decision trees and random forests
- Classification metrics deep dive
- Cross-validation strategies
- Support Vector Machines
- Feature engineering
- End-to-end ML pipeline
Bridge Projects (prepare for transformers):
- 11.5: Neural Networks from scratch (backprop, depth vs width)
- 11.75: RNNs from scratch (BPTT, vanishing gradients, why transformers are better)
Key Learning: Gradient descent, loss functions, generalization, proper evaluation, deep learning intuition, sequence modeling
Goal: Build and pretrain a transformer to understand base models
Bridge Projects (build intuition before assembly):
- 12.1: Attention Mechanisms from scratch
- 12.25: Embeddings & representation learning via skip-gram
Core Projects:
- Build transformer architecture from scratch
- Tokenization and text preprocessing
- Pretrain tiny transformer on Shakespeare (4-12 hours on M4)
- Analyze pretrained vs random models
Key Learning: Self-attention, multi-head attention, embeddings, pretraining dynamics, why base models work
Goal: Fine-tune Mistral 7B and analyze behavior changes
Projects:
- Instruction tune Mistral 7B with LoRA (using MLX)
- Comparative analysis: base vs tuned model
- Systematic evaluation and documentation
Key Learning: LoRA efficiency, instruction tuning, model evaluation
All notebooks now resolve the repository root dynamically instead of using a user-specific absolute path like /Users/mark/git/learning-ml-to-llm. Use either the inline helper pattern:
import sys, pathlib
def add_repo_root(markers=("requirements.txt","README.md",".git")):
here = pathlib.Path.cwd().resolve()
for candidate in [here] + list(here.parents):
if any((candidate / m).exists() for m in markers):
if str(candidate) not in sys.path:
sys.path.insert(0, str(candidate))
break
add_repo_root()Or reuse the utility:
from utils.path_helpers import add_repo_root_to_sys_path
add_repo_root_to_sys_path()After this, relative imports like from utils import metrics work from any project subfolder without editing paths.
The repository now includes unified backend auto-detection via utils.device.
Priority order:
- MLX (Apple Silicon) if available (
import mlx.core as mx). - PyTorch CUDA if
torch.cuda.is_available(). - PyTorch MPS if
torch.backends.mps.is_available(). - CPU fallback (torch CPU or pure Python).
Usage in notebooks (already inserted in Phase 2 & 3 transformer notebooks):
from utils.device import get_device, backend_info, tensor, ensure_seed
print("Using backend:", backend_info())
ensure_seed(42)
# Create a tensor on the active backend
x = tensor([[1.0, 2.0], [3.0, 4.0]])Override backend manually:
export LEARNING_ML_BACKEND=cpu # options: mlx | cuda | mps | cpu
python scripts/verify_device.pyQuick verification script:
python scripts/verify_device.pyThis prints the chosen backend and runs a tiny matmul to confirm functionality.
Why this matters:
- Seamless cross-platform execution (Apple Silicon MLX, Linux CUDA, macOS MPS).
- Single import path for device logic keeps notebooks clean.
- Consistent seeding across random, NumPy, torch, and MLX for reproducibility.
See utils/device.py for details and helper functions (backend_name, move_to).
- Python 3.8+ installed
- 4-8GB RAM minimum (64GB recommended for Phase 3)
- Jupyter for running notebooks
- Time commitment: 10-20 hours/week for 4-6 months
- Math background: Basic calculus and linear algebra helpful but not required
# Clone this repository
git clone https://github.com/yourusername/learning-ml-to-llm.git
cd learning-ml-to-llm
# Automated setup (recommended)
./scripts/setup_environment.shOr manual setup (click to expand)
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Setup Jupyter kernel
python3 -m ipykernel install --user --name=ml-learning --display-name="Python (ML Learning)"# Activate environment
source venv/bin/activate
# Launch Jupyter
jupyter notebook
# Open: projects/phase1_classical_ml/project01_linear_regression/linear_regression_from_scratch.ipynbWork through projects sequentially:
- Complete the notebook
- Run all experiments
- Document learnings in
PROGRESS_LOG.md - Move to next project
π You're ready to start learning!
learning-ml-to-llm/
βββ projects/
β βββ phase1_classical_ml/ # Projects 1-11 (with 4.5, 6.5, 7.5, 7.8)
β βββ phase2_transformers/ # Projects 12-15 (with 12.5, 13.5)
β βββ phase3_llm_tuning/ # Projects 16-17
βββ docs/
β βββ GLOSSARY.md # ML terminology reference
β βββ LEARNING_OVERVIEW.md # Learning strategy guide
β βββ TESTING_GUIDE.md # ML testing patterns & practices
β βββ MLOPS_PROFESSIONAL_GUIDE.md # Experiment tracking, A/B testing, monitoring
β βββ RESPONSIBLE_AI_GUIDE.md # Bias, fairness, explainability, privacy
β βββ PROFESSIONAL_TOPICS_OVERVIEW.md # Integration & timeline
βββ utils/
β βββ visualization.py # Plotting utilities
β βββ data_generators.py # Data generation
β βββ metrics.py # Evaluation metrics
βββ data/
β βββ raw/ # Raw datasets
β βββ processed/ # Processed data
βββ scripts/
β βββ setup_environment.sh # Setup script
β βββ download_shakespeare.py # Download training data
βββ requirements.txt # Python dependencies
βββ GETTING_STARTED_PLAN.md # Detailed getting started guide
βββ PROGRESS_LOG.md # Learning progress tracking
βββ README.md # This file
- CPU: Any modern processor
- RAM: 4-8GB
- Storage: 5GB
- OS: macOS, Linux, or Windows
- CPU: Apple Silicon (M1/M2/M3/M4) or modern x86
- RAM: 32-64GB for Phase 3 (Mistral fine-tuning)
- Storage: 20GB
- OS: macOS (for MLX optimization) or Linux
| Phase | Project | Time | RAM Needed |
|---|---|---|---|
| Phase 1 | Classical ML (1-11) | Seconds-Minutes | 2-4GB |
| Phase 2 | Build Transformer (12-13) | Minutes | 1-2GB |
| Phase 2 | Pretrain Tiny Model (14) | 4-12 hours | 3-8GB |
| Phase 3 | Fine-tune Mistral (16) | Hours | 20-30GB |
Good news: Phases 1-2 run on any laptop. Only Phase 3 needs serious hardware.
Apple Silicon users: MLX makes your Mac perfect for all phases!
- Implement from scratch first - Understand before using libraries
- Visualize everything - Loss curves, decision boundaries, attention
- Experiment systematically - Vary hyperparameters, observe effects
- Document deeply - Record insights, not just results
- Don't rush - Deep understanding > speed
- Minimum: 1-2 hours/day, 5 days/week
- Recommended: 2-3 hours/day, 5-6 days/week
- Total timeline: 19-23 weeks (~4-6 months)
Track your progress in PROGRESS_LOG.md:
- Projects completed
- Key insights learned
- Challenges encountered
- Experimental results
Learning Paths:
classical_ml_learning_path.md- Detailed Phase 1 guide (Projects 1-11)complete_ml_learning_path_with_pretraining.md- Full journey (Projects 12-17)mistral_mlx_learning_project.md- MLX fine-tuning guideGETTING_STARTED_PLAN.md- Step-by-step setup
Professional Development Guides (in docs/):
TESTING_GUIDE.md- ML testing patterns, pytest, CI/CDMLOPS_PROFESSIONAL_GUIDE.md- Experiment tracking, A/B testing, monitoringRESPONSIBLE_AI_GUIDE.md- Bias detection, fairness, explainability, privacyPROFESSIONAL_TOPICS_OVERVIEW.md- Integration guide and timeline
- StatQuest - Intuitive ML explanations
- 3Blue1Brown - Visual understanding of math
- Papers with Code - Implementation references
- MLX Documentation - Apple silicon optimization
By completing this journey, you'll understand:
Technical Understanding:
- How gradient descent works at a deep level
- What transformers do and why they work
- How pretraining creates language understanding
- Why fine-tuning is efficient and effective
Practical Skills:
- Implement ML/DL algorithms from scratch
- Design and run rigorous experiments
- Evaluate models systematically
- Optimize for Apple silicon (MLX)
Research Capacity:
- Analyze model behavior methodically
- Design experiments to test hypotheses
- Document findings rigorously
- Connect to AI safety research
Most courses do this:
from transformers import AutoModel
model = AutoModel.from_pretrained("mistral-7b")
# β¨ Magic happens β¨You learn to use tools but don't understand what's inside.
This curriculum does this:
# Week 1: Build gradient descent from scratch
def gradient_descent(X, y, learning_rate):
# You write every line
...
# Week 13: Build attention mechanism
def self_attention(Q, K, V):
# You understand every operation
...
# Week 15: Watch your model learn language
# You see loss decrease, watch text generation improve
# Week 18: Now use the tools with full understanding
from mlx_lm import load
# You know exactly what this does internally- Foundations First: Master optimization on simple problems
- Build, Don't Import: Implement before using libraries
- Visualize Everything: See what's happening inside
- Experiment Systematically: Change parameters, observe effects
- Understand, Then Apply: Theory + practice = mastery
For ML Engineering:
- Debug models by understanding internals
- Choose right architectures with confidence
- Optimize training efficiently
- Read papers and implement them
For AI Safety Research:
- Understand how fine-tuning changes behavior
- Analyze model responses rigorously
- Design evaluation methodologies
- Document epistemic properties
For Deep Understanding:
- Know why things work, not just how
- Build intuition through experimentation
- Connect concepts across domains
- Ready for cutting-edge research
- Check the docs: Review
GETTING_STARTED_PLAN.mdand phase READMEs - Read the code: Utility modules have detailed comments
- Visualize: Use plotting tools to understand behavior
- Experiment: Try changing parameters to build intuition
- Document: Write down your confusion - often solves itself!
- π¬ Discussions: Use GitHub Discussions for questions
- π Issues: Report bugs or unclear instructions
- β Star: If this helps you, star the repo!
- π΄ Fork: Adapt for your learning style
Found a bug? Have an improvement? Contributions welcome!
# Fork the repo
# Create a branch
git checkout -b feature/your-improvement
# Make changes and test
# Commit and push
git commit -m "Add: your improvement"
git push origin feature/your-improvement
# Open a Pull RequestGood contributions:
- Fixing errors in notebooks
- Adding visualizations
- Improving documentation
- Adding new experiments
- Sharing your learning insights
Inspired by:
- Andrej Karpathy's "Neural Networks: Zero to Hero"
- Fast.ai's practical deep learning approach
- Stanford CS231n and CS224n courses
- The MLX community
Built with:
- NumPy, scikit-learn, PyTorch for implementations
- MLX for Apple Silicon optimization
- Jupyter for interactive learning
- Lots of β and determination
Curriculum Stats:
- π 20+ comprehensive projects (17 core + 4 professional extension projects)
- π» 20+ Jupyter notebooks
- π οΈ 3 utility modules with 30+ functions
- π 10 detailed documentation files (6 learning paths + 4 professional guides)
- β±οΈ ~200-250 hours of hands-on coding (150-200 core + 40-60 professional)
- π 4-7 months total learning time (19-23 weeks core + 4-7 weeks professional topics)
Difficulty Progression:
Difficulty
β
β β±βββββββ Phase 3
β β±βββββββββββ± (Advanced)
β β±βββββββββββ± + Professional Topics
β β±βββββββββββ± Phase 2 (Optional)
β βββββ± (Intermediate)
β Phase 1
β (Beginner)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ> Time (weeks)
0 5 10 15 20 25 30
Current Status: β Complete curriculum (v1.0)
Future Additions (Community-driven):
- Video walkthroughs for each project
- Additional datasets and experiments
- Reinforcement Learning from Human Feedback (RLHF) module
- Distributed training examples
- More AI safety case studies
- Translation to other languages
Want to contribute? See contributing section above!
- Clone this repository
- Run
./scripts/setup_environment.sh - Open Project 1 notebook in Jupyter
- Read the theoretical foundation section
- Complete Project 1: Linear Regression
- Experiment with different learning rates
- Document insights in
PROGRESS_LOG.md - Start Project 2: Logistic Regression
- Complete Projects 1-4 (Fundamentals)
- Build strong optimization intuition
- Understand loss functions deeply
- Finish all classical ML projects (1-11)
- Ready for transformer architecture
- Built transformer from scratch
- Pretrained your own model
- Fine-tuned Mistral 7B
- Ready for ML research or engineering roles!
- β Implement from scratch first - Understand before optimizing
- β Visualize everything - Plots reveal understanding
- β Experiment freely - Break things to learn
- β Document insights - Your future self will thank you
- β Take your time - Deep learning requires deep understanding
- β Don't skip projects - Each builds on previous ones
- β Don't rush - Speed β understanding
- β Don't copy-paste - Type code to internalize
- β Don't skip visualization - You'll miss key insights
- β Don't work in isolation - Share your progress!
MIT License - see LICENSE for details.
TL;DR: Free to use, modify, and share. Perfect for personal learning, classroom use, or building upon.
"The best way to understand deep learning is to build it from scratch."
β This Curriculum
This journey takes months, not days. But when you finish, you won't just know how to use LLMsβyou'll understand why they work.
You'll be able to:
- Read any ML paper and implement it
- Debug models by understanding internals
- Design new architectures with confidence
- Contribute to cutting-edge research
The goal isn't to finish fast. The goal is to understand deeply.
Take your time. Enjoy the process. Build something amazing.
Ready to start your ML journey?
π Read Getting Started Guide | π Start Project 1 | β Star This Repo
Happy Learning! πβ¨
Built with β€οΈ for deep understanding