Skip to content

Multimodal Large Language Model (MLLM) Tuning Survey: Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model

Notifications You must be signed in to change notification settings

WenkeHuang/Awesome-MLLM-Tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MLLM-Tuning

Awesome-MLLM-Tuning

Curated list of Multimodal Large Language Model (MLLM) Tuning resources, aligned with our work:
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model

arXiv Badge Custom Badge License Badge GitHub stars

πŸ™Œ Abstract

Multi-modal Large Language Models (MLLMs) integrate visual and linguistic reasoning to address complex tasks such as image captioning and visual question answering. While MLLMs demonstrate remarkable versatility, MLLMs appears limited performance on special application. But tuning MLLMs for downstream tasks encounters two key challenges: Task-Expert Specialization, where distribution shifts between pre-training and target datasets constrain target performance, and Open-World Stabilization, where catastrophic forgetting erases the model general knowledge. In this work, we systematically review recent advancements in MLLM tuning methodologies, classifying them into three paradigms: (I) Selective Tuning, (II) Additive Tuning, and (III) Reparameterization Tuning. Furthermore, we benchmark these tuning strategies across popular MLLM architectures and diverse downstream tasks to establish standardized evaluation analysis and systematic tuning principles. Finally, we highlight several open challenges in this domain and propose future research directions.

πŸ“– Paper

Selective Tuning

Iterative Selective Tuning

Time Title Venue Paper Code
2024.10 AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models ICLR'24 link link
2023.12 Sparse is Enough in Fine-tuning Pre-trained Large Language Models ICML'24 link link
2023.12 Gradient-based Parameter Selection for Efficient Fine-Tuning CVPR'24 link link
2023.11 Unified Low-Resource Sequence Labeling by Sample-Aware Dynamic Sparse Finetuning EMNLP'23 link link
2023.08 Overcoming Generic Knowledge Loss with Selective Parameter Update CVPR'24 link link
2023.06 LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation ICML'23 link link
2022.10 ROSE: Robust Selective Fine-tuning for Pre-trained Language Models IJCAI'22 link link
2022.05 Parameter-Efficient Sparsity for Large Language Models Fine-Tuning IJCAI'22 link link
2021.10 Composable Sparse Fine-Tuning for Cross-Lingual Transfer ACL'22 link link
2021.09 Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning EMNLP'21 link link
2015.06 Learning both Weights and Connections for Efficient Neural Networks NeurIPS'15 link -

Posterior Selective Tuning

Time Title Venue Paper Code
2024.12 Revisiting Weight Averaging for Model Merging arXiv'24 link link
2024.10 Parameter Competition Balancing for Model Merging NeurIPS'24 link link
2024.06 Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging NeurIPS'24 link link
2024.05 TEMR-Merging: Tuning-Free High-Performance Model Merging NeurIPS'24 link link
2024.02 Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models ICML'24 link link
2023.11 Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch ICML'24 link link
2023.06 TIES-Merging: Resolving Interference When Merging Models NeurIPS'23 link link
2021.11 Merging Models with Fisher-Weighted Averaging NeurIPS'22 link link

Additive Tuning

Adapter Tuning

Time Title Venue Paper Code
2024.04 Conditional Prototype Rectification Prompt Learning TCSVT'25 link link
2023.11 Meta-Adapter: An Online Few-shot Learner for Vision-Language Model NeurIPS'23 link link
2023.09 GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph NeurIPS'23 link link
2023.04 Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement ICCV'23 link link
2023.03 Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens CVPR'23 link link
2023.02 Side Adapter Network for Open-Vocabulary Semantic Segmentation CVPR'23 link link
2022.11 Task Residual for Tuning Vision-Language Models CVPR'23 link link
2022.06 LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning NeurIPS'22 link link
2021.11 Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling ECCV'22 link link
2021.10 CLIP-Adapter: Better Vision-Language Models with Feature Adapters IJCV'23 link link
2019.02 Parameter-Efficient Transfer Learning for NLP ICML'19 link link

Prompt Tuning

Time Title Venue Paper Code
2024.03 PromptKD: Unsupervised Prompt Distillation for Vision-Language Models CVPR'24 link link
2024.03 Domain-agnostic mutual prompting for unsupervised domain adaptation CVPR'24 link link
2024.01 Learning to prompt with text only supervision for vision-language models AAAI'25 link link
2023.11 ArGue: Attribute-guided prompt tuning for vision-language models CVPR'24 link link
2023.09 Dept: Decoupled prompt tuning CVPR'24 link link
2023.09 Distribution-aware prompt tuning for vision-language models ICCV'23 link link
2023.08 Knowledge-aware prompt tuning for generalizable vision-language models ICCV'23 link -
2023.07 Self-regulating prompts: Foundational model adaptation without forgetting ICCV'23 link link
2023.03 Visual-language prompt tuning with knowledge-guided context optimization CVPR'23 link link
2022.10 MaPLe: Multi-modal prompt learning CVPR'23 link link
2022.10 Prompt learning with optimal transport for vision-language models ICLR'23 link link
2022.06 Dualcoop: Fast adaptation to multi-label recognition with limited annotations NeurIPS'22 link link
2022.05 Prompt-aligned Gradient for Prompt Tuning ICCV'23 link link
2022.03 Visual prompt tuning ECCV'22 link link
2022.03 Conditional prompt learning for vision-language models CVPR'22 link link
2021.09 Learning to prompt for vision-language models IJCV'22 link link
2021.01 Prefix-Tuning: Optimizing Continuous Prompts for Generation ACL'21 link link
2020.10 AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts EMNLP'20 link link
2019.11 How Can We Know What Language Models Know? TACL'20 link link

Reparameterization Tuning

Structure Reparameterization Tuning

Time Title Venue Paper Code
2025.02 REMEDY: Recipe merging dynamics in large vision-language models ICLR'25 link -
2024.12 Lora.rar: Learning to merge loras via hypernetworks for subject-style conditioned image generation ICCV'25 link link
2024.08 Teamlora: Boosting low-rank adaptation with expert collaboration and competition arXiv'24 link link
2024.06 Twinmerging: Dynamic integration of modular expertise in model merging NeurIPS'24 link link
2024.06 Mixture-of-subspaces in low-rank adaptation EMNLP'24 link link
2024.06 Sharelora: Parameter efficient and robust large language model fine-tuning via shared low-rank adaptation arXiv'24 link link
2024.05 Parameter-Efficient Fine-Tuning with Discrete Fourier Transform ICML'24 link link
2024.03 Mtlora: Low-rank adaptation approach for efficient multi-task learning CVPR'24 link link
2024.02 Multimodal instruction tuning with conditional mixture of lora ACL'24 link link
2023.12 Loramoe: Alleviating world knowledge forgetting in large language models via moe-style plugin ACL'24 link link
2023.10 Vera: Vectorbased random matrix adaptation ICLR'24 link link
2023.07 Lorahub: Efficient cross-task generalization via dynamic lora composition COLM'24 link link

Calibration Reparameterization Tuning

Time Title Venue Paper Code
2025.03 Lorasculpt: Sculpting lora for harmonizing general and specialized knowledge in multimodal large language models CVPR'25 link link
2024.10 Controlled low-rank adaptation with subspace regularization for continued training on large language models arXiv'24 link -
2024.07 Learn to preserve and diversify: Parameter-efficient group with orthogonal regularization for domain generalization ECCV'24 link link
2024.06 Corda: Context-oriented decomposition adaptation of large language models for task-aware parameter-efficient fine-tuning NeurIPS'24 link link
2024.06 Milora: Harnessing minor singular components for parameter-efficient llm finetuning NAACL'25 link link
2024.04 Pissa: Principal singular values and singular vectors adaptation of large language models NeurIPS'24 link link
2024.03 Lora meets dropout under a unified framework ACL'24 link -
2024.03 Bilora: A bi-level optimization framework for overfitting-resilient low-rank adaptation of large pre-trained models arXiv'24 link -
2024.02 Melora: mini-ensemble low-rank adapters for parameter-efficient fine-tuning ACL'24 link link
2024.02 Prolora: Partial rotation empowers more parameter-efficient lora ACL'24 link link
2024.02 Lora+: Efficient low rank adaptation of large models ICML'24 link link
2024.02 Dora: Weight-decomposed low-rank adaptation ICML'24 link link
2024.02 Flora: Low-rank adapters are secretly gradient compressors ICML'24 link link
2023.08 Bayesian low-rank adaptation for large language models ICLR'24 link link

πŸ‘‹ Contact

This repository is currently maintained by Wenke Huang πŸ‘¨β€πŸ’».
If you have any questions, concerns, or suggestions regarding the contents of this repository or the resources shared here, feel free to reach out! I'm more than happy to assist you with any inquiries or help you navigate through the materials.
Please don't hesitate to send an email to me at [email protected] πŸ“§ or Wechat πŸ€—.

πŸ₯³ Citation

If you find this repository helpful for your research, we would greatly appreciate it if you could cite our papers. ✨

@misc{MLLMTuning_arXiv25,
      title={Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model}, 
      author={Wenke Huang, Jian Liang, Xianda Guo, Yiyang Fang, Guancheng Wan, Xuankun Rong, Chi Wen, Zekun Shi,  Qingyun Li, Didi Zhu, Yanbiao Ma, Ke Liang, Bin Yang, He Li, Jiawei Shao, Mang Ye, Bo Du},
      year={2025},
      eprint={2503.04543},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

@inproceedings{LiangLoRASculpt_CVPR2025,
    author    = {Liang, Jian and Huang, Wenke and Wan, Guancheng and Yang, Qu and Ye, Mang},
    title     = {LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models},
    booktitle = {CVPR},
    year      = {2025},
}

@inproceedings{FangSEPM_ICML2025,
  title     = {Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models},
  author    = {Fang, Yiyang and Liang, Jian and Huang, Wenke and Li, He and Su, Kehua and Ye, Mang},
  booktitle = {ICML},
  year      = {2025},
}

@misc{ye2025surveysafetylargevisionlanguage,
      title={A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations}, 
      author={Mang Ye and Xuankun Rong and Wenke Huang and Bo Du and Nenghai Yu and Dacheng Tao},
      year={2025},
      eprint={2502.14881},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}

πŸ” Relevant Projects

[1] LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models - CVPR 2025 [Link][Code]

[2] Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models - ICML 2025 [Link][Code]

[3] A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations - arXiv 2025 [Link][Code]

You Only Live Once.

I hope that all players have fun.

About

Multimodal Large Language Model (MLLM) Tuning Survey: Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •