ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical issue of clinically unreliable hallucinations commonly exhibited by existing multimodal large language models (MLLMs) in electrocardiogram (ECG) interpretation. To enhance diagnostic reliability, we propose the first reasoning-oriented MLLM specifically designed for trustworthy ECG analysis. Our approach integrates protocol-guided instruction data generation, a decoupled architecture with interleaved modality dropout, and a reinforcement learning mechanism grounded in diagnostic evidence. These innovations collectively improve both model accuracy and cross-modal consistency. Notably, this study provides the first quantitative demonstration of the pervasive and severe hallucination problem in current models. To advance trustworthy multimodal medical AI, we publicly release our code and an online platform for community use and further research.

Technology Category

Application Category

📝 Abstract
Electrocardiography (ECG) serves as an indispensable diagnostic tool in clinical practice, yet existing multimodal large language models (MLLMs) remain unreliable for ECG interpretation, often producing plausible but clinically incorrect analyses. To address this, we propose ECG-R1, the first reasoning MLLM designed for reliable ECG interpretation via three innovations. First, we construct the interpretation corpus using \textit{Protocol-Guided Instruction Data Generation}, grounding interpretation in measurable ECG features and monograph-defined quantitative thresholds and diagnostic logic. Second, we present a modality-decoupled architecture with \textit{Interleaved Modality Dropout} to improve robustness and cross-modal consistency when either the ECG signal or ECG image is missing. Third, we present \textit{Reinforcement Learning with ECG Diagnostic Evidence Rewards} to strengthen evidence-grounded ECG interpretation. Additionally, we systematically evaluate the ECG interpretation capabilities of proprietary, open-source, and medical MLLMs, and provide the first quantitative evidence that severe hallucinations are widespread, suggesting that the public should not directly trust these outputs without independent verification. Code and data are publicly available at \href{https://github.com/PKUDigitalHealth/ECG-R1}{here}, and an online platform can be accessed at \href{http://ai.heartvoice.com.cn/ECG-R1/}{here}.
Problem

Research questions and friction points this paper is trying to address.

ECG interpretation
multimodal large language models
clinical hallucination
diagnostic reliability
modality-agnostic
Innovation

Methods, ideas, or system contributions that make the work stand out.

Protocol-Guided Instruction Data Generation
Modality-Decoupled Architecture
Interleaved Modality Dropout
Reinforcement Learning with Diagnostic Evidence Rewards
ECG Interpretation Reliability
🔎 Similar Papers
Jiarui Jin
Jiarui Jin
Xiaohongshu; Shanghai Jiao Tong University; University College London
Multimodal MiningRecommender SystemInformation RetrievalLarge Language Model
H
Haoyu Wang
National Institute of Health Data Science, Peking University
X
Xingliang Wu
Tianjin Institute of Cardiology, the Second Hospital of Tianjin Medical University
X
Xiaocheng Fang
School of Intelligence Science and Technology, Peking University
Xiang Lan
Xiang Lan
NC state University
AI4SE
Zihan Wang
Zihan Wang
University of Electronic Science and Technology of China
AI SecurityLLM Security
D
Deyun Zhang
HeartVoice Medical Technology
B
Bo Liu
School of Intelligence Science and Technology, Peking University
Y
Yingying Zhang
Jarvis Lab, Tencent
Xian Wu
Xian Wu
Director of Tencent Jarvis Lab
large language modeldata miningmachine learning
H
Hongyan Li
School of Intelligence Science and Technology, Peking University
Shenda Hong
Shenda Hong
Assistant Professor, Peking University
AI ECGBiosignalAI for Digital HealthHealth Data ScienceAI for Healthcare