The official repository containing the introduction and code for our NAACL 2025 paper: MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning.
| 🔥 News | 💡 Motivation | 🌈 Method |
| ⚡️ Quick Start | 📓 Citation | 📃 Paper |
- Jan 2025: Our paper has been accepted by NAACL 2025 main conference.
- Oct 2024: We released our code and quick start.
- May 2024: We released our paper on arxiv.
- Full-finetuning is too expensive to train.
- LoRA, the most popular parameter-efficient finetuning method and its varients, are randomly initialized.
- We argue that this strategy may override the important pretrained features, thus degrading the performance of low-rank adaptation methods.
- To this end, we propose Minor singular component based Low Rank Adaptation (MiLoRA) for efficient LLM finetuning.
- Specifically, we use minor components of the pretrained model to initialize the LoRA.
- This strategy encourages the model to learn in the less-optimized subspace, thus reducing the interference with the well-learned pretrained knowledge encoded in the principal singular components.
We use the code from LLM-Adapters repo to do commensense reasoning tasks, compare to the lora implement in LLM-Adapters, we only modified it with our LoRA initialization. We diectly use their setting in other experiments without modifications.
Our math reasoning code is modified from PiSSA.
For training peft modules on math and code tasks, we do the following preparation.
-
Data(download directly in this path):
- Math dataset: meta-math/MetaMathQA
-
Environment:
conda create -n milora python=3.10.14 -y conda activate milora pip install torch==2.3.0 pip install -r requirements.txt -
Model(for milora and pissa):
-
See the shell file and the corresponding .py for more details.
bash scripts/run_svd_init.sh
-
-
Train
Run the following shell to train your model, modified
$save_rootto determine which path to save the checkpoints.-
# to train milora/pissa/lora bash scripts/run_train.sh $method $base_model $save_root # e.g. bash scripts/run_train.sh milora ./svd_init_models/LLM-Adapters-rank-64-min ./output
See the training log in
./logs, and we also implementreport_to tensorboardby default. Usetensorboard --logdir $save_rootto check tensorboard output. -
The GSM8K and MATH evaluations are already included in the training code, check the evaluation results in results/gsm8k and results/MATH.
We use the implementation in open-instruct.
We use the implementation in DoRA, for hyperparameters, we directly followed the LoRA setting in Visual Instruction Tuning.
If you find this repo is useful, please cite us as:
@inproceedings{wang-etal-2025-milora,
title = "{M}i{L}o{RA}: Harnessing Minor Singular Components for Parameter-Efficient {LLM} Finetuning",
author = "Wang, Hanqing and
Li, Yixia and
Wang, Shuo and
Chen, Guanhua and
Chen, Yun",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.naacl-long.248/",
pages = "4823--4836",
ISBN = "979-8-89176-189-6",
abstract = "Efficient finetuning of large language models (LLMs) aims to adapt the LLMs with reduced computational and memory costs. Previous LoRA-based approaches initialize the low-rank matrices with Gaussian distribution and zero values while keeping the original weight matrices frozen. However, the trainable model parameters optimized in an unguided subspace might interfere with the well-learned subspace of the pretrained weight matrices. In this paper, we propose MiLoRA, a simple yet effective LLM finetuning approach that only updates the minor singular components of the weight matrix while keeping the principal singular components frozen. It is observed that the minor matrix corresponds to the noisy or long-tail information, while the principal matrix contains important knowledge. The MiLoRA initializes the low-rank matrices within a subspace that is orthogonal to the principal matrix, thus the pretrained knowledge is expected to be well preserved. During finetuning, MiLoRA makes the most use of the less-optimized subspace for learning the labeled dataset. Extensive experiments on commonsense reasoning, math reasoning, instruction following and visual instruction following benchmarks present the superior performance of our method."
}