MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing medical anomaly detection models struggle with reliable reasoning and multimodal generalization due to their reliance on fragmented data. To address this, this work introduces MedAD-38K, the first large-scale, multimodal, multicenter benchmark for medical anomaly detection, and proposes a two-stage training framework. The framework first aligns structured reasoning with responses through cognitive injection and then refines logical coherence via a consistency reinforcement strategy. The core innovation is the Con-GRPO algorithm, which uniquely integrates a consistency-based reward mechanism into policy optimization to ensure high coherence between generated reasoning and final diagnoses. Built upon large multimodal models and leveraging supervised fine-tuning, Chain-of-Thought annotations, and structured visual question answering, the proposed method, MedAD-R1, achieves state-of-the-art performance on MedAD-38K, surpassing strong baselines by over 10% and significantly enhancing both accuracy and interpretability in medical anomaly detection.

Technology Category

Application Category

📝 Abstract

Medical Anomaly Detection (MedAD) presents a significant opportunity to enhance diagnostic accuracy using Large Multimodal Models (LMMs) to interpret and answer questions based on medical images. However, the reliance on Supervised Fine-Tuning (SFT) on simplistic and fragmented datasets has hindered the development of models capable of plausible reasoning and robust multimodal generalization. To overcome this, we introduce MedAD-38K, the first large-scale, multi-modal, and multi-center benchmark for MedAD featuring diagnostic Chain-of-Thought (CoT) annotations alongside structured Visual Question-Answering (VQA) pairs. On this foundation, we propose a two-stage training framework. The first stage, Cognitive Injection, uses SFT to instill foundational medical knowledge and align the model with a structured think-then-answer paradigm. Given that standard policy optimization can produce reasoning that is disconnected from the final answer, the second stage incorporates Consistency Group Relative Policy Optimization (Con-GRPO). This novel algorithm incorporates a crucial consistency reward to ensure the generated reasoning process is relevant and logically coherent with the final diagnosis. Our proposed model, MedAD-R1, achieves state-of-the-art (SOTA) performance on the MedAD-38K benchmark, outperforming strong baselines by more than 10\%. This superior performance stems from its ability to generate transparent and logically consistent reasoning pathways, offering a promising approach to enhancing the trustworthiness and interpretability of AI for clinical decision support.

Problem

Research questions and friction points this paper is trying to address.

Medical Anomaly Detection

Interpretable Reasoning

Multimodal Generalization

Consistency

Chain-of-Thought

Innovation

Methods, ideas, or system contributions that make the work stand out.

Consistency-Reinforced Policy Optimization

Chain-of-Thought Reasoning

Medical Anomaly Detection