Adaptive LoRA Rank Allocation with Mixed-Precision Quantization

This project investigates joint optimization of LoRA rank allocation and mixed-precision quantization for efficient fine-tuning on laptop-class hardware.

Quick Start

Local Development (Mac/Linux)

# Install dependencies
pip install -r requirements.txt

# Run single experiment
python run_experiment.py --config B-FP --task sst2 --model bert-base-uncased

# Run full experiment matrix
python run_all_experiments.py --epochs 3 --batch-size 8

Apple Silicon (MPS) Compatibility

Apple Silicon Macs have MPS (Metal Performance Shaders) limitations that can cause tensor size errors. The system automatically detects and applies MPS-safe settings:

# Automatic MPS detection (recommended)
python run_experiment.py --config B-FP --task sst2 --model bert-base-uncased

# Force MPS-safe settings (smaller batch sizes)
python run_experiment.py --config B-FP --task sst2 --model bert-base-uncased --mps-safe

# Manual batch size reduction
python run_experiment.py --config B-FP --task sst2 --model bert-base-uncased --batch-size 4

If you encounter MPS tensor size errors, try:

Reduce batch size: --batch-size 4 or --batch-size 2
Use Docker with GPU backend (recommended for quantization)
Disable quantization: --quant-backend none

Docker Deployment (GPU Recommended)

# Build and run on GPU
./scripts/docker-run.sh --build --gpu --wandb-key $WANDB_API_KEY run-all

# Run single experiment on CPU
./scripts/docker-run.sh --cpu run B-FP sst2

# Interactive development
./scripts/docker-run.sh --gpu shell

Experiment Configurations

B-FP: Baseline fixed-rank LoRA (FP16)
B-Q4: Baseline 4-bit QLoRA
B-Ada: Baseline AdaLoRA (adaptive rank)
Joint-1/2/3: Combined adaptive rank + quantization

Quantization Backends

Use --quant-backend to control quantization strategy:

auto: Auto-detect best backend (default)
cuda-4bit: 4-bit quantization on GPU
cuda-8bit: 8-bit quantization on GPU
cpu-int8: 8-bit CPU quantization
none: Disable quantization

Examples:

# Force CPU quantization
python run_experiment.py --config B-Q4 --task sst2 --quant-backend cpu-int8

# GPU-only 4-bit (fails on non-CUDA)
python run_experiment.py --config B-Q4 --task sst2 --quant-backend cuda-4bit

# Disable quantization
python run_experiment.py --config B-Q4 --task sst2 --quant-backend none

Troubleshooting

MPS Tensor Size Limit Error

If you see:

MPSNDArray.mm:788: failed assertion `[MPSNDArray initWithDevice:descriptor:] Error: total bytes of NDArray > 2**32'

Solutions:

Use smaller batch sizes: --batch-size 4 or --batch-size 2
Use Docker with GPU: Recommended for quantization experiments
Disable quantization: --quant-backend none for local testing

High Memory Usage

For systems with limited RAM:

Reduce batch size: --batch-size 4
Use gradient accumulation (automatically adjusted)
Close other applications

Quantization Issues

On Apple Silicon:

Use --quant-backend none for local development
Use Docker with GPU for full quantization experiments
CPU quantization: --quant-backend cpu-int8 (slower but stable)

Results Structure

results/
├── results_B-FP_sst2.json      # Individual experiment results
├── results_B-Q4_wikitext2.json
├── all_results.json            # Aggregated results
└── experiment_summary.csv      # Summary table

Hardware Requirements

Local: 16GB+ RAM, Apple M-series or Intel/AMD CPU
Docker: NVIDIA GPU with 8GB+ VRAM (recommended)
Quantization: CUDA-compatible GPU or CPU fallback

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
figures		figures
results_fixed		results_fixed
scripts		scripts
src		src
tables		tables
test_results_backend		test_results_backend
test_results_fix		test_results_fix
test_results_fix3		test_results_fix3
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
EXPERIMENT_COMPLETION_SUMMARY.md		EXPERIMENT_COMPLETION_SUMMARY.md
Experiment-Design.md		Experiment-Design.md
MANUSCRIPT_PACKAGE_SUMMARY.md		MANUSCRIPT_PACKAGE_SUMMARY.md
Makefile		Makefile
Pre-Experiment.md		Pre-Experiment.md
README.md		README.md
create_publication_figures.py		create_publication_figures.py
manuscript.aux		manuscript.aux
manuscript.bbl		manuscript.bbl
manuscript.blg		manuscript.blg
manuscript.out		manuscript.out
manuscript.pdf		manuscript.pdf
manuscript.tex		manuscript.tex
references.bib		references.bib
requirements.txt		requirements.txt
run_all_experiments.py		run_all_experiments.py
run_experiment.py		run_experiment.py
test_lora_debug.py		test_lora_debug.py
test_no_lora.py		test_no_lora.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Adaptive LoRA Rank Allocation with Mixed-Precision Quantization

Quick Start

Local Development (Mac/Linux)

Apple Silicon (MPS) Compatibility

Docker Deployment (GPU Recommended)

Experiment Configurations

Quantization Backends

Troubleshooting

MPS Tensor Size Limit Error

High Memory Usage

Quantization Issues

Results Structure

Hardware Requirements

About

Uh oh!

Releases

Packages

Languages

CatsMeow492/adaptive-lora-rank-allocation

Folders and files

Latest commit

History

Repository files navigation

Adaptive LoRA Rank Allocation with Mixed-Precision Quantization

Quick Start

Local Development (Mac/Linux)

Apple Silicon (MPS) Compatibility

Docker Deployment (GPU Recommended)

Experiment Configurations

Quantization Backends

Troubleshooting

MPS Tensor Size Limit Error

High Memory Usage

Quantization Issues

Results Structure

Hardware Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages