Transformer from scratch

Open-source implementation of the Transformer architecture, based on the paper Attention Is All You Need (Vaswani et al., 2017), enriched with my personal notes and explanations.

This project provides a complete implementation of the Transformer architecture for English → French Translation. Built entirely from scratch with PyTorch, it demonstrates the full end-to-end pipeline from data acquisition and preprocessing to inference.

It aims to be:

Minimal: the implementation mirrors the paper's architecture (multi-head attention, positional encodings, and the encoder-decoder design), as a way for me to deepen my own understanding
Modular: distinct modules handle data preprocessing, model architecture, training workflows, and evaluation, ensuring clarity and easy expansion
Customizable: hyperparameters can be adjusted to reproduce the base, big, or smaller Transformer variants (number of layers, hidden size, dropout, ...)
Hands-on: provides end-to-end scripts for downloading and preprocessing datasets (like WMT14 in this work), training models on NVIDIA GPU (CUDA) machines, and assessing performance using BLEU scores

Key Features

Complete Transformer Implementation: full encoder-decoder architecture with multi-head attention, positional encoding, and feed-forward networks
Compute Optimization: efficient attention computation, KV Caching memory structure for inference optimization
End-to-End Pipeline: data download, preprocessing, training, evaluation, and inference
BPE Tokenization: SentencePiece-based byte-pair encoding for robust subword tokenization
Training Best Practices: label smoothing, gradient clipping, learning rate warmup, early stopping
Comprehensive Evaluation: BLEU score computation with sample translations
Flexible Inference

Dataset

The project uses the WMT14 English-French dataset, which includes:

Europarl v7: European Parliament proceedings
Common Crawl: Web-scraped parallel corpus
News Commentary: News articles and commentary

The dataset can be automatically downloaded and preprocessed.

Requirements

python -m venv .venv
.venv\Scripts\activate
# source .venv/bin/activate  # Linux/Mac
uv sync

Usage

0. Notes: transformer.ipynb

This notebook-based implementation of the architecture with some personal notes and explanations can be helpful.

1. Download Dataset

Download the WMT14 English-French dataset:

python -m src.data.download_data

# optional arguments
python -m src.data.download_data --data-dir ./custom/path

2. Preprocess Data

Split the dataset and train SentencePiece BPE model:

python -m src.data.preprocess_data

# optional arguments
python -m src.data.preprocess_data \
    --raw-dir ./src/data/raw \
    --processed-dir ./src/data/processed \
    --vocab-size 32000 \
    --max-sentences 200000 \
    --train-ratio 0.98 \
    --val-ratio 0.01

This creates:

Train/validation/test splits
SentencePiece BPE model (sp_bpe.model)
BPE-encoded files (.bpe extension)

3. Train Model

Train the Transformer model:

python -m src.scripts.train

# optional arguments:
python -m src.scripts.train \
    --d-model 512 \
    --heads 8 \
    --d-ff 2048 \
    --num-layers 6 \
    --dropout 0.1 \
    --batch-size 16 \
    --epochs 10 \
    --lr 5e-4 \
    --max-seq-len 512 \
    --warmup-steps 4000 \
    --early-stopping-patience 3 \
    --model-name my_model

The model checkpoint will be saved to src/scripts/checkpoints/.

4. Evaluate Model

Evaluate the trained model and compute BLEU score:

python -m src.scripts.evaluate --model-name best --num-samples 5

# optional arguments
python -m src.scripts.evaluate \
    --model-name best \
    --checkpoints-path ./custom/path/model.pt \
    --results-file results \
    --num-samples 10 \
    --batch-size 16

This will:

Compute BLEU score on the test set
Display sample translations
Save results to JSON file

5. Inference

Translate English sentences to French using the trained model.

# Interactive Mode
python -m src.scripts.inference --model-name best --interactive

# Single Sentence
python -m src.scripts.inference --model-name best --sentence "Hello, my name is zedems!"

# Batch File Translation
python -m src.scripts.inference \
    --model-name best \
    --input-file sentences.txt \
    --output-file translations.txt \
    --batch-size 8

Project Structure

training-transformer-from-scratch/
├── src/
│   ├── data/
│   │   ├── download_data.py      # Download WMT14 dataset
│   │   ├── preprocess_data.py    # Data preprocessing & BPE training
│   │   ├── raw/                  # Raw dataset files
│   │   └── processed/            # Processed data & BPE model
│   ├── notebooks/
│   │   └── transformer+note.ipynb   # Implementation + notes
│   ├── scripts/
│   │   ├── train.py              # Training script
│   │   ├── evaluate.py           # Evaluation script
│   │   ├── inference.py          # Inference script
│   │   ├── transformer.py        # Transformer architecture
│   │   ├── utils.py              # Helper functions
│   │   └── checkpoints/          # Saved model checkpoints
│   └── config.py                 # Configuration file
├── README.md
└── pyproject.toml

Notes

GPU Required: Training requires CUDA-capable GPU
Memory Usage: Adjust batch size based on available GPU memory
Training Time: Depends on dataset size and model configuration
Checkpoints: Best model is saved based on validation loss

Citation

@article{vaswani2017attention,
  title={Attention is all you need},
  author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Lukasz and Polosukhin, Illia},
  journal={Advances in neural information processing systems},
  volume={30},
  year={2017}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
src		src
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer from scratch

Key Features

Dataset

Requirements

Usage

0. Notes: transformer.ipynb

1. Download Dataset

2. Preprocess Data

3. Train Model

4. Evaluate Model

5. Inference

Project Structure

Notes

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transformer from scratch

Key Features

Dataset

Requirements

Usage

0. Notes: transformer.ipynb

1. Download Dataset

2. Preprocess Data

3. Train Model

4. Evaluate Model

5. Inference

Project Structure

Notes

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages