🤖 AI Summary
This work addresses the limitations of general-purpose multimodal large language models (MLLMs) in medical diagnosis, which often fail to capture domain-specific details and suffer from high fine-tuning costs and poor scalability. The authors propose a novel, fine-tuning-free in-context learning framework that emulates clinical reasoning by integrating discriminative exemplar core set selection (DECS) and self-refining experience summarization (SRES). This approach jointly constructs a visual core set and a dynamic experience repository, enabling parameter-efficient medical reasoning. Evaluated across all 12 datasets in MedMNIST 2D, the method significantly outperforms zero-shot general and medical MLLMs and achieves performance on par with fully supervised vision models and domain-finetuned MLLMs.
📝 Abstract
General Multimodal Large Language Models (MLLMs) often underperform in capturing domain-specific nuances in medical diagnosis, trailing behind fully supervised baselines. Although fine-tuning provides a remedy, the high costs of expert annotation and massive computational overhead limit its scalability. To bridge this gap without updating the weights of the pre-trained backbone of the MLLM, we propose a Clinician Mimetic Workflow. This is a novel In-Context Learning (ICL) framework designed to synergize Discriminative Exemplar Coreset Selection (DECS) and Self-Refined Experience Summarization (SRES). Specifically, DECS simulates a clinician's ability to reference "anchor cases" by selecting discriminative visual coresets from noisy data at the computational level; meanwhile, SRES mimics the cognition and reflection in clinical diagnosis by distilling diverse rollouts into a dynamic textual Experience Bank. Extensive evaluation across all 12 datasets of the MedMNIST 2D benchmark demonstrates that our method outperforms zero-shot general and medical MLLMs. Simultaneously, it achieves performance levels comparable to fully supervised vision models and domain-specific fine-tuned MLLMs, setting a new benchmark for parameter-efficient medical in-context learning. Our code is available at an anonymous repository: https://anonymous.4open.science/r/Synergizing-Discriminative-Exemplars-and-Self-Refined-Experience-ED74.