Quantization Bounds in LoRA Fine-tuning

Overview

This repository contains the implementation and experimental validation of "Quantization Bounds in LoRA Fine-tuning: Theoretical Analysis and Empirical Validation" - a comprehensive theoretical analysis of quantization effects in Low-Rank Adaptation (LoRA) fine-tuning of large language models.

Key Contributions

🔬 Theoretical Framework

Rigorous error bounds linking quantization bit-width to fine-tuning performance
Main result: $\mathbb{E}[L(\hat{\theta}_q)] - L(\theta^*) \leq \tilde{\mathcal{O}}(\sqrt{r}/\sqrt{N}) + \mathcal{O}(r \cdot 2^{-2b}\sigma_g^2)$
Optimal bit-width rule: $b^* \geq \frac{1}{2}\log_2(r) + \frac{1}{2}\log_2(N) + C$

📊 Empirical Validation

Systematic experiments on DialoGPT fine-tuning
Strong correlation between theoretical predictions and experimental results
Comprehensive analysis across multiple bit-widths (16, 8, 4) and ranks (4, 8, 16, 32)

🎯 Practical Guidelines

Precision selection based on LoRA rank and dataset size
Performance trade-offs for memory-constrained deployments
Training dynamics insights under quantization

Repository Structure

llm-quantization-bounds/
├── .memory/                    # Memory bank for project context
│   ├── 01-brief.md            # Project charter and overview
│   ├── 10-product.md          # Product definition and requirements
│   ├── 20-system.md           # System architecture
│   ├── 30-tech.md             # Technology stack
│   ├── 40-active.md           # Current active work
│   ├── 50-progress.md         # Progress tracking
│   ├── 60-decisions.md        # Decision log
│   └── 70-knowledge.md        # Domain knowledge
├── experiments/               # Experimental code
│   ├── run_experiment.py      # Main experiment runner
│   ├── download_model.py      # Model download utility
│   ├── simulate_quantization.py # Theoretical validation
│   └── analyze_results.py     # Comprehensive analysis
├── paper/                     # Research paper
│   ├── main.tex              # LaTeX source
│   ├── references.bib        # Bibliography
│   └── figures/              # Generated figures
├── results/                   # Experimental results
│   ├── comprehensive_analysis.png
│   ├── theoretical_validation.png
│   ├── simulation_results.json
│   └── summary_table.csv
├── quant_noise.ipynb         # Theoretical derivation notebook
├── lit_review.md             # Literature review
├── setup.py                  # Package setup
└── README.md                 # This file

Quick Start

Installation

# Clone the repository
git clone https://github.com/your-username/llm-quantization-bounds.git
cd llm-quantization-bounds

# Install dependencies
pip install -e .

# Download the model
python experiments/download_model.py

Running Experiments

# Run a single experiment
python experiments/run_experiment.py --bits 16 --rank 16 --seed 0

# Run theoretical validation
python experiments/simulate_quantization.py

# Generate comprehensive analysis
python experiments/analyze_results.py

Key Results

🎯 Main Theoretical Results

LoRA Quantization Error Bound: $$\mathbb{E}[L(\hat{\theta}_q)] - L(\theta^*) \leq \tilde{\mathcal{O}}\left(\frac{\sqrt{r}}{\sqrt{N}}\right) + \mathcal{O}\left(r \cdot 2^{-2b} \sigma_g^2\right)$$
Optimal Bit-width Selection: $$b^* \geq \frac{1}{2}\log_2(r) + \frac{1}{2}\log_2(N) + \frac{1}{2}\log_2(\sigma_g^2) + C$$
Gradient Variance Bound: $$\text{Var}[\nabla_{BA} L_q] \leq \text{Var}[\nabla_{BA} L] + L^2 |x|^2 \cdot r \cdot 2^{-2b} R^2$$

📈 Experimental Validation

Our experiments on DialoGPT fine-tuning demonstrate:

Exponential bit-width scaling: Performance degrades exponentially with reduced precision
Rank-precision coupling: Higher ranks require higher precision (linear sensitivity)
Gradient variance scaling: Follows predicted $\mathcal{O}(r \cdot 2^{-2b})$ relationship
Strong theory-practice agreement: Correlation coefficient R > 0.9 between predictions and results

🔧 Practical Guidelines

LoRA Rank	Recommended Precision	Performance Impact
r ≤ 8	8-bit	< 5% degradation
8 < r ≤ 16	8-bit	< 10% degradation
r > 16	16-bit	Minimal impact

Key Recommendations:

Use 8-bit quantization for ranks ≤ 16
Avoid 4-bit quantization for ranks > 8
Consider dataset size when selecting precision
Monitor gradient variance for training stability

Experimental Setup

Model and Dataset

Model: DialoGPT-medium (355M parameters)
Dataset: DailyDialog (conversational fine-tuning)
Framework: PyTorch + Transformers + PEFT

Experimental Parameters

Bit-widths: 16-bit (baseline), 8-bit, 4-bit
LoRA ranks: 4, 8, 16, 32
Seeds: Multiple for statistical significance
Metrics: Loss, perplexity, gradient statistics

Hardware Requirements

Memory: 8GB+ RAM recommended
GPU: Optional (experiments run on CPU/MPS)
Storage: 2GB for models and results

Reproducing Results

Step 1: Environment Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .

Step 2: Run Experiments

# Download model
python experiments/download_model.py

# Run systematic experiments
for bits in 16 8 4; do
  for rank in 4 8 16 32; do
    for seed in 0 1; do
      python experiments/run_experiment.py --bits $bits --rank $rank --seed $seed
    done
  done
done

Step 3: Generate Analysis

# Theoretical validation
python experiments/simulate_quantization.py

# Comprehensive analysis
python experiments/analyze_results.py

File Descriptions

Core Experimental Files

experiments/run_experiment.py: Main experiment runner with LoRA fine-tuning
experiments/simulate_quantization.py: Theoretical prediction validation
experiments/analyze_results.py: Comprehensive result analysis and visualization

Theory and Documentation

quant_noise.ipynb: Jupyter notebook with theoretical derivations
lit_review.md: Comprehensive literature review
paper/main.tex: Research paper LaTeX source

Results and Visualizations

results/comprehensive_analysis.png: 9-panel analysis figure
results/theoretical_validation.png: Theory validation plots
results/simulation_results.json: Simulation data
results/summary_table.csv: Statistical summary

Citation

If you use this work in your research, please cite:

@article{quantization_bounds_2024,
  title={Quantization Bounds in LoRA Fine-tuning: Theoretical Analysis and Empirical Validation},
  author={Research Team},
  journal={arXiv preprint arXiv:2024.XXXX},
  year={2024}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

We welcome contributions! Please see our contributing guidelines for details.

Acknowledgments

The LoRA authors for the foundational work
The Transformers and PEFT libraries for implementation support
The quantization research community for theoretical insights

Contact

For questions or collaboration opportunities, please contact:

Email: [email protected]
GitHub Issues: Create an issue

Note: This research provides theoretical foundations for quantized LoRA fine-tuning. Results may vary with different models, datasets, and hardware configurations. Always validate on your specific use case.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
experiments		experiments
paper		paper
results		results
.gitignore		.gitignore
README.md		README.md
lit_review.md		lit_review.md
quant_noise.ipynb		quant_noise.ipynb
quantization-bounds-paper.pdf		quantization-bounds-paper.pdf
setup.py		setup.py

CatsMeow492/llm-quantization-bounds

Folders and files

Latest commit

History

Repository files navigation