LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal large language models (MLLMs) excel on general vision tasks but suffer significant performance degradation on out-of-distribution (OOD) tasks in specialized domains—such as medical imaging—due to severe scarcity of labeled data. Method: We propose a low-resource, parameter-efficient adaptation framework comprising three components: (1) collaborative question-answer (QA) pair generation via a QA generator and caption distillation; (2) caption distillation regularization to enforce representation consistency; and (3) selective fine-tuning of only QA-task-relevant neurons. Contribution/Results: By synergistically leveraging limited labeled data and abundant unlabeled data, our method achieves substantial improvements over standard full-parameter fine-tuning on two low-supervision benchmarks—gastrointestinal endoscopy QA and sports vision QA—demonstrating strong generalization under extreme data scarcity and high computational efficiency.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) have achieved strong performance on general visual benchmarks but struggle with out-of-distribution (OOD) tasks in specialized domains such as medical imaging, where labeled data is limited and expensive. We introduce LEAML, a label-efficient adaptation framework that leverages both scarce labeled VQA samples and abundant unlabeled images. Our approach generates domain-relevant pseudo question-answer pairs for unlabeled data using a QA generator regularized by caption distillation. Importantly, we selectively update only those neurons most relevant to question-answering, enabling the QA Generator to efficiently acquire domain-specific knowledge during distillation. Experiments on gastrointestinal endoscopy and sports VQA demonstrate that LEAML consistently outperforms standard fine-tuning under minimal supervision, highlighting the effectiveness of our proposed LEAML framework.
Problem

Research questions and friction points this paper is trying to address.

Adapts MLLMs to specialized domains with limited labeled data
Generates domain-relevant pseudo QA pairs from unlabeled images
Selectively updates key neurons for efficient domain knowledge acquisition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates domain-relevant pseudo question-answer pairs
Selectively updates neurons relevant to question-answering
Uses caption distillation to regularize QA generator