Synergizing Discriminative Exemplars and Self-Refined Experience for MLLM-based In-Context Learning in Medical Diagnosis

📅 2026-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of general-purpose multimodal large language models (MLLMs) in medical diagnosis, which often fail to capture domain-specific details and suffer from high fine-tuning costs and poor scalability. The authors propose a novel, fine-tuning-free in-context learning framework that emulates clinical reasoning by integrating discriminative exemplar core set selection (DECS) and self-refining experience summarization (SRES). This approach jointly constructs a visual core set and a dynamic experience repository, enabling parameter-efficient medical reasoning. Evaluated across all 12 datasets in MedMNIST 2D, the method significantly outperforms zero-shot general and medical MLLMs and achieves performance on par with fully supervised vision models and domain-finetuned MLLMs.
📝 Abstract
General Multimodal Large Language Models (MLLMs) often underperform in capturing domain-specific nuances in medical diagnosis, trailing behind fully supervised baselines. Although fine-tuning provides a remedy, the high costs of expert annotation and massive computational overhead limit its scalability. To bridge this gap without updating the weights of the pre-trained backbone of the MLLM, we propose a Clinician Mimetic Workflow. This is a novel In-Context Learning (ICL) framework designed to synergize Discriminative Exemplar Coreset Selection (DECS) and Self-Refined Experience Summarization (SRES). Specifically, DECS simulates a clinician's ability to reference "anchor cases" by selecting discriminative visual coresets from noisy data at the computational level; meanwhile, SRES mimics the cognition and reflection in clinical diagnosis by distilling diverse rollouts into a dynamic textual Experience Bank. Extensive evaluation across all 12 datasets of the MedMNIST 2D benchmark demonstrates that our method outperforms zero-shot general and medical MLLMs. Simultaneously, it achieves performance levels comparable to fully supervised vision models and domain-specific fine-tuned MLLMs, setting a new benchmark for parameter-efficient medical in-context learning. Our code is available at an anonymous repository: https://anonymous.4open.science/r/Synergizing-Discriminative-Exemplars-and-Self-Refined-Experience-ED74.
Problem

Research questions and friction points this paper is trying to address.

Medical Diagnosis
Multimodal Large Language Models
In-Context Learning
Domain-Specific Nuances
Parameter-Efficient Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

In-Context Learning
Discriminative Exemplar Coreset Selection
Self-Refined Experience Summarization
Multimodal Large Language Models
Medical Diagnosis
🔎 Similar Papers
No similar papers found.