MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical anomaly detection models struggle with reliable reasoning and multimodal generalization due to their reliance on fragmented data. To address this, this work introduces MedAD-38K, the first large-scale, multimodal, multicenter benchmark for medical anomaly detection, and proposes a two-stage training framework. The framework first aligns structured reasoning with responses through cognitive injection and then refines logical coherence via a consistency reinforcement strategy. The core innovation is the Con-GRPO algorithm, which uniquely integrates a consistency-based reward mechanism into policy optimization to ensure high coherence between generated reasoning and final diagnoses. Built upon large multimodal models and leveraging supervised fine-tuning, Chain-of-Thought annotations, and structured visual question answering, the proposed method, MedAD-R1, achieves state-of-the-art performance on MedAD-38K, surpassing strong baselines by over 10% and significantly enhancing both accuracy and interpretability in medical anomaly detection.

Technology Category

Application Category

📝 Abstract
Medical Anomaly Detection (MedAD) presents a significant opportunity to enhance diagnostic accuracy using Large Multimodal Models (LMMs) to interpret and answer questions based on medical images. However, the reliance on Supervised Fine-Tuning (SFT) on simplistic and fragmented datasets has hindered the development of models capable of plausible reasoning and robust multimodal generalization. To overcome this, we introduce MedAD-38K, the first large-scale, multi-modal, and multi-center benchmark for MedAD featuring diagnostic Chain-of-Thought (CoT) annotations alongside structured Visual Question-Answering (VQA) pairs. On this foundation, we propose a two-stage training framework. The first stage, Cognitive Injection, uses SFT to instill foundational medical knowledge and align the model with a structured think-then-answer paradigm. Given that standard policy optimization can produce reasoning that is disconnected from the final answer, the second stage incorporates Consistency Group Relative Policy Optimization (Con-GRPO). This novel algorithm incorporates a crucial consistency reward to ensure the generated reasoning process is relevant and logically coherent with the final diagnosis. Our proposed model, MedAD-R1, achieves state-of-the-art (SOTA) performance on the MedAD-38K benchmark, outperforming strong baselines by more than 10\%. This superior performance stems from its ability to generate transparent and logically consistent reasoning pathways, offering a promising approach to enhancing the trustworthiness and interpretability of AI for clinical decision support.
Problem

Research questions and friction points this paper is trying to address.

Medical Anomaly Detection
Interpretable Reasoning
Multimodal Generalization
Consistency
Chain-of-Thought
Innovation

Methods, ideas, or system contributions that make the work stand out.

Consistency-Reinforced Policy Optimization
Chain-of-Thought Reasoning
Medical Anomaly Detection
Multimodal Benchmark
Interpretable AI
🔎 Similar Papers
No similar papers found.
H
Haitao Zhang
School of Informatics, Xiamen University
Y
Yingying Wang
School of Informatics, Xiamen University; Institute of Artificial Intelligence, Xiamen University
Jiaxiang Wang
Jiaxiang Wang
King's College London
semantic communicationsgenerative aimachine learningwireless communicationinformation theory
H
Haote Xu
Zhejiang Expressway Co., Ltd.; School of Transportation Science and Engineering, Beihang University
Hongyang Zhang
Hongyang Zhang
Assistant Professor of Computer Science, University of Waterloo
Machine LearningInference AccelerationAI Security
Yirong Chen
Yirong Chen
Stanford University
Yue Huang
Yue Huang
Professor, Xiamen University
signal processingimage processingmachine learning
Xinghao Ding
Xinghao Ding
Unknown affiliation