Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Multimodal large language models (MLLMs) face two fundamental challenges in downstream fine-tuning: *task specialization*—performance degradation due to distribution shift between pretraining and target data—and *open-world stability*—catastrophic forgetting that erodes general capabilities. This work introduces the first unified analytical framework jointly addressing both challenges. We systematically categorize fine-tuning paradigms into three classes—Selective, Additive, and Reparameterization—and establish the first cross-architecture, cross-task standardized benchmark, evaluating methods across LLaVA, Qwen-VL, and others on diverse tasks including VQA and image captioning. Our empirical analysis reveals consistent trade-off patterns between task-specific performance and continual learning stability across paradigms. To foster reproducibility and community advancement, we open-source Awesome-MLLM-Tuning—a continuously updated, authoritative repository of methods, benchmarks, and best practices for MLLM adaptation.

Technology Category

Application Category

📝 Abstract

Multi-modal Large Language Models (MLLMs) integrate visual and linguistic reasoning to address complex tasks such as image captioning and visual question answering. While MLLMs demonstrate remarkable versatility, MLLMs appears limited performance on special applications. But tuning MLLMs for downstream tasks encounters two key challenges: Task-Expert Specialization, where distribution shifts between pre-training and target datasets constrain target performance, and Open-World Stabilization, where catastrophic forgetting erases the model general knowledge. In this work, we systematically review recent advancements in MLLM tuning methodologies, classifying them into three paradigms: (I) Selective Tuning, (II) Additive Tuning, and (III) Reparameterization Tuning. Furthermore, we benchmark these tuning strategies across popular MLLM architectures and diverse downstream tasks to establish standardized evaluation analysis and systematic tuning principles. Finally, we highlight several open challenges in this domain and propose future research directions. To facilitate ongoing progress in this rapidly evolving field, we provide a public repository that continuously tracks developments: https://github.com/WenkeHuang/Awesome-MLLM-Tuning.

Problem

Research questions and friction points this paper is trying to address.

Address performance limitations in specialized applications of MLLMs

Overcome Task-Expert Specialization and Open-World Stabilization challenges

Systematically review and benchmark MLLM tuning methodologies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Tuning for task-specific adaptation

Additive Tuning to enhance model capabilities

Reparameterization Tuning to optimize model parameters

🔎 Similar Papers

No similar papers found.