LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Multimodal large language models (MLLMs) excel on general vision tasks but suffer significant performance degradation on out-of-distribution (OOD) tasks in specialized domains—such as medical imaging—due to severe scarcity of labeled data. Method: We propose a low-resource, parameter-efficient adaptation framework comprising three components: (1) collaborative question-answer (QA) pair generation via a QA generator and caption distillation; (2) caption distillation regularization to enforce representation consistency; and (3) selective fine-tuning of only QA-task-relevant neurons. Contribution/Results: By synergistically leveraging limited labeled data and abundant unlabeled data, our method achieves substantial improvements over standard full-parameter fine-tuning on two low-supervision benchmarks—gastrointestinal endoscopy QA and sports vision QA—demonstrating strong generalization under extreme data scarcity and high computational efficiency.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) have achieved strong performance on general visual benchmarks but struggle with out-of-distribution (OOD) tasks in specialized domains such as medical imaging, where labeled data is limited and expensive. We introduce LEAML, a label-efficient adaptation framework that leverages both scarce labeled VQA samples and abundant unlabeled images. Our approach generates domain-relevant pseudo question-answer pairs for unlabeled data using a QA generator regularized by caption distillation. Importantly, we selectively update only those neurons most relevant to question-answering, enabling the QA Generator to efficiently acquire domain-specific knowledge during distillation. Experiments on gastrointestinal endoscopy and sports VQA demonstrate that LEAML consistently outperforms standard fine-tuning under minimal supervision, highlighting the effectiveness of our proposed LEAML framework.

Problem

Research questions and friction points this paper is trying to address.

Adapts MLLMs to specialized domains with limited labeled data

Generates domain-relevant pseudo QA pairs from unlabeled images

Selectively updates key neurons for efficient domain knowledge acquisition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates domain-relevant pseudo question-answer pairs

Selectively updates neurons relevant to question-answering

Uses caption distillation to regularize QA generator

🔎 Similar Papers

VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks