Retrieval-augmented in-context learning for multimodal large language models in disease classification

📅 2025-05-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor few-shot generalization capability in multimodal medical disease classification, this paper proposes a Retrieval-Augmented In-Context Learning (RAICL) framework. RAICL is the first approach to deeply integrate Retrieval-Augmented Generation (RAG) with In-Context Learning (ICL) for multimodal medical classification, enabling dynamic cross-modal, cross-architecture, and cross-scale semantic retrieval of similar examples via adaptive fusion of ResNet (for images) and BERT/BioBERT/ClinicalBERT (for text) embeddings, and constructing optimized conversational prompts. On the TCGA and IU Chest X-ray datasets, RAICL achieves absolute accuracy improvements of 5.14% and 7.34%, respectively. Ablation studies reveal that textual modality dominates performance, Euclidean distance yields optimal retrieval precision, and cosine similarity achieves higher macro-F1. The method demonstrates robust performance gains across diverse multimodal large language models, including Qwen, LLaVA, and Gemma.

Technology Category

Application Category

📝 Abstract
Objectives: We aim to dynamically retrieve informative demonstrations, enhancing in-context learning in multimodal large language models (MLLMs) for disease classification. Methods: We propose a Retrieval-Augmented In-Context Learning (RAICL) framework, which integrates retrieval-augmented generation (RAG) and in-context learning (ICL) to adaptively select demonstrations with similar disease patterns, enabling more effective ICL in MLLMs. Specifically, RAICL examines embeddings from diverse encoders, including ResNet, BERT, BioBERT, and ClinicalBERT, to retrieve appropriate demonstrations, and constructs conversational prompts optimized for ICL. We evaluated the framework on two real-world multi-modal datasets (TCGA and IU Chest X-ray), assessing its performance across multiple MLLMs (Qwen, Llava, Gemma), embedding strategies, similarity metrics, and varying numbers of demonstrations. Results: RAICL consistently improved classification performance. Accuracy increased from 0.7854 to 0.8368 on TCGA and from 0.7924 to 0.8658 on IU Chest X-ray. Multi-modal inputs outperformed single-modal ones, with text-only inputs being stronger than images alone. The richness of information embedded in each modality will determine which embedding model can be used to get better results. Few-shot experiments showed that increasing the number of retrieved examples further enhanced performance. Across different similarity metrics, Euclidean distance achieved the highest accuracy while cosine similarity yielded better macro-F1 scores. RAICL demonstrated consistent improvements across various MLLMs, confirming its robustness and versatility. Conclusions: RAICL provides an efficient and scalable approach to enhance in-context learning in MLLMs for multimodal disease classification.
Problem

Research questions and friction points this paper is trying to address.

Enhancing disease classification using retrieval-augmented in-context learning
Dynamic retrieval of similar disease patterns for effective demonstrations
Improving multimodal large language models' accuracy in medical diagnosis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic retrieval of informative disease pattern demonstrations
Integration of RAG and ICL for adaptive learning
Multi-modal embedding analysis with diverse encoders
🔎 Similar Papers
No similar papers found.
Zaifu Zhan
Zaifu Zhan
PhD at University of Minnesota, MS at Tsinghua University
Natural language processingMachine LearningAI for BiomedicineLarge Language model
S
Shuang Zhou
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, 516 Delaware St SE, Minneapolis, 55455, MN, USA
Xiaoshan Zhou
Xiaoshan Zhou
University of Michigan
Yongkang Xiao
Yongkang Xiao
PhD student in University of Minnesota
Large Language ModelsKnowledge GraphsNLPHealth Informatics
J
Jun Wang
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, 516 Delaware St SE, Minneapolis, 55455, MN, USA
Jiawen Deng
Jiawen Deng
University of Electronic Science and Technology of China
NLPAI SafetyAffective Computing
H
He Zhu
Department of Chemical Engineering and Materials Science, University of Minnesota, 421 Washington Ave. SE, Minneapolis, Minneapolis, 55455, MN, USA
Y
Yu Hou
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, 516 Delaware St SE, Minneapolis, 55455, MN, USA
R
Rui Zhang
Division of Computational Health Sciences, Department of Surgery, University of Minnesota, 516 Delaware St SE, Minneapolis, 55455, MN, USA