🤖 AI Summary
Existing ECG classification methods lack interpretable reasoning mechanisms aligned with clinical practice, making it difficult to provide transparent diagnostic justifications. Inspired by clinicians’ diagnostic workflows, this work proposes CardioThink, the first multimodal large model framework that emulates the clinical chain of thought through structured, multi-stage reasoning—sequentially analyzing rhythm, conduction, waveform morphology, and overall impression. The framework introduces Structured Set Policy Optimization, a novel training strategy that learns clinically plausible reasoning trajectories without requiring manual annotations. Evaluated across multiple ECG benchmarks, CardioThink significantly improves diagnostic accuracy while generating interpretable reasoning paths that demonstrate high clinical validity.
📝 Abstract
Electrocardiogram (ECG) diagnosis in clinical practice relies on structured reasoning over multiple hierarchical aspects, including cardiac rhythm, conduction properties, waveform morphology, and overall diagnostic impression. However, most existing approaches predict labels directly from ECG signals without explicit clinical reasoning, resulting in opaque decisions that lack clinical alignment. To bridge this gap, we propose CardioThink, a physician-inspired multimodal large language model (MLLM) framework that explicitly models the diagnostic reasoning process through human-interpretable intermediate stages (rhythm, conduction, morphology, and impression) to derive final classification results. Furthermore, we introduce Structured Set Policy Optimization (SSPO) to jointly optimize adherence to this structured reasoning format and the accuracy of variable-size diagnostic sets, without requiring manually annotated reasoning traces. Extensive experiments on diverse ECG benchmarks demonstrate the significant superiority of our approach in diagnostic accuracy, while simultaneously providing interpretable clinical reasoning. Notably, reasoning quality evaluations confirm that SSPO substantially enhances the clinical validity of the generated rationales. These findings reveal that moving beyond direct label prediction toward structured reasoning offers a more clinically aligned direction for future ECG modeling.