This repository presents a comprehensive empirical study of adaptive rank allocation in LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning. We investigate whether varying LoRA rank per transformer layer can improve the trade-off between fine-tuning performance and parameter efficiency compared to fixed-rank baselines.
Research Question: Can adaptive (non-uniform) LoRA rank allocation across layers outperform fixed-rank LoRA in efficiency or performance?
| Strategy | Eval Loss | Perplexity | Parameters (M) | Efficiency | Training Time |
|---|---|---|---|---|---|
| Baseline Rank 16 | 4.90 | 134.0 | 6.3 | 1.7% | 81.8s |
| Linear Decay | 5.11 | 165.1 | 6.3 | 1.7% | 83.5s |
| Baseline Rank 8 | 5.31 | 203.3 | 3.1 | 0.9% | 87.5s |
| Attention-Heavy | NaN | NaN | 5.1 | 1.4% | 90.2s |
| Empirical | NaN | NaN | 3.5 | 1.0% | 88.8s |
- Fixed-rank LoRA (rank 16) achieved best performance on DialoGPT-medium
- Linear decay adaptive strategy was stable but performed 23% worse than best baseline
- Complex adaptive strategies experienced training instability (NaN gradients)
- Gradual rank variation (linear decay) more stable than dramatic changes
- Adaptive strategies require specialized optimization techniques for practical deployment
# Clone the repository
git clone https://github.com/TaylorMohney/adaptive-lora-placement.git
cd adaptive-lora-placement
# Install dependencies
pip install -r requirements.txt
# Run all experiments (takes ~30 minutes)
bash scripts/run_all_experiments.sh
# Generate analysis and figures
python scripts/analyze_results.py
# View results
open results/analysis/analysis_report.mdThis work builds on our previous research on selective LoRA placement, which demonstrated that layer type matters for efficient adaptation. We extend this by exploring whether individual layers should have different adaptation capacities.
π Previous Work: Selective LoRA: Systematic Placement Strategies for Parameter-Efficient Fine-Tuning
- Layer Hierarchy: Different transformer layers capture different types of representations
- Efficiency Challenge: Fixed-rank LoRA treats all layers equally
- Optimization Opportunity: Can we allocate adaptation capacity more strategically?
- Python 3.11+
- PyTorch 2.0+
- CUDA compatible GPU (recommended)
- 8GB+ RAM
# Clone the repository
git clone https://github.com/TaylorMohney/adaptive-lora-placement.git
cd adaptive-lora-placement
# Install dependencies
pip install -r requirements.txt
# Validate installation
python scripts/validate_setup.py# Create conda environment
conda create -n adaptive-lora python=3.11
conda activate adaptive-lora
# Install PyTorch (adjust for your CUDA version)
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# Install other dependencies
pip install -r requirements.txt- Model: DialoGPT-medium (361M parameters, 24 layers)
- Dataset: Alpaca instruction-following dataset (200 examples)
- Task: Conversational response generation
- Evaluation: Perplexity, loss, parameter efficiency
Gradually reduces rank from input to output layers (16β4)
rank_i = max(4, 16 - (16-4) * i / (24-1))Higher ranks for attention layers, lower for feed-forward
rank_attention = 16
rank_feedforward = 8Based on transformer learning patterns (early: 10, middle: 19, late: 16)
# Early layers: 60% of base rank
# Middle layers: 120% of base rank
# Late layers: 100% of base rank# Baseline experiments
python scripts/train_baseline.py --rank 8 --output_dir results/baseline_rank8
python scripts/train_baseline.py --rank 16 --output_dir results/baseline_rank16
# Adaptive experiments
python scripts/train_adaptive.py --strategy linear_decay --output_dir results/adaptive_linear_decay
python scripts/train_adaptive.py --strategy attention_heavy --output_dir results/adaptive_attention_heavy
python scripts/train_adaptive.py --strategy empirical --output_dir results/adaptive_empirical# Run everything (takes ~30 minutes)
bash scripts/run_all_experiments.sh
# Generate analysis and figures
python scripts/analyze_results.py
# View comprehensive report
open results/analysis/analysis_report.md# Create custom rank allocation
python scripts/train_adaptive.py --strategy custom --config_file my_strategy.jsonExample my_strategy.json:
{
"strategy_name": "custom",
"layer_ranks": [16, 16, 14, 12, 10, 8, 8, 6, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
}adaptive-lora-placement/
βββ data/ # Preprocessed Alpaca samples
βββ models/ # LoRA configurations per strategy
βββ results/ # Experimental results and analysis
β βββ baseline_rank8/ # Baseline rank 8 results
β βββ baseline_rank16/ # Baseline rank 16 results
β βββ adaptive_*/ # Adaptive strategy results
β βββ analysis/ # Comprehensive analysis and figures
βββ scripts/ # Training and evaluation scripts
β βββ train_baseline.py # Fixed-rank LoRA training
β βββ train_adaptive.py # Adaptive rank training
β βββ prepare_data.py # Data preprocessing
β βββ analyze_results.py # Results analysis
β βββ validate_setup.py # Installation validation
βββ paper/ # Research paper and figures
β βββ draft.md # Complete research paper
β βββ figures/ # Publication-quality figures
βββ .memory/ # Memory bank system (internal)
βββ requirements.txt # Python dependencies
- Best Overall: Baseline Rank 16 (Perplexity: 134.0)
- Best Adaptive: Linear Decay (Perplexity: 165.1, stable training)
- Training Issues: 2 of 3 adaptive strategies failed with NaN gradients
- Parameter Efficiency: All strategies used 0.9-1.7% of total parameters
After running experiments, you'll find:
- Detailed Results:
results/analysis/analysis_report.md - Performance Charts:
results/analysis/performance_comparison.png - Efficiency Plots:
results/analysis/efficiency_analysis.png - Strategy Comparison:
results/analysis/strategy_comparison.png - Raw Data:
results/analysis/combined_results.csv
- Use fixed-rank LoRA (rank 16) for production deployments
- Linear decay shows promise but needs optimization improvements
- Avoid complex adaptive strategies without specialized training techniques
- Monitor gradient behavior carefully with adaptive allocation
The complete research paper is available in paper/draft.md and includes:
- Abstract & Introduction: Research motivation and background
- Methodology: Detailed description of adaptive strategies
- Experimental Results: Comprehensive analysis of all strategies
- Discussion: Implications for adaptive LoRA research
- Conclusion: Practical recommendations and future work
- LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2022)
- AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning (Zhang et al., 2023)
- QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023)
- Selective LoRA: Our previous work
| Strategy (Previous) | Loss | Params (M) | Reduction | Perplexity |
|---|---|---|---|---|
| Full LoRA | 3.089 | 6.29 | 0% | 4,283 |
| Attention Only | 3.481 | 4.33 | 31.2% | 2,272 |
| Feed-Forward Only | 4.252 | 1.97 | 68.8% | 3,391 |
We welcome contributions! Please:
- Fork the repository
- Create a feature branch
- Add your adaptive strategy or improvements
- Submit a pull request
- New adaptive rank allocation strategies
- Training stabilization techniques
- Extension to other models (GPT, BERT, etc.)
- Advanced analysis methods
- Performance optimizations
If you use this work in your research, please cite:
@article{mohney2024adaptive,
title={Adaptive LoRA: Layerwise Rank Allocation for Parameter-Efficient Fine-Tuning},
author={Mohney, Taylor},
journal={arXiv preprint arXiv:TBD},
year={2024}
}- Lead Researcher: Taylor Mohney
- Researchers: Dorian Hryniewicki
- Affiliation: University of Nevada, Las Vegas
- Email: mohney@unlv.nevada.edu
- GitHub Issues: For technical questions and bug reports
This project is licensed under the MIT License - see the LICENSE file for details.
β Star this repository if you find it useful for your research!