From Reports to Ontologies: Ontology-Guided Representation Learning for 12-Lead ECG

๐Ÿ“… 2026-05-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the underutilization of structured semantic information in clinical diagnostic codes by existing 12-lead ECG representation learning methods. The authors propose MAR-ECG, a novel framework that, for the first time, integrates the SNOMED CT cardiac ontology graph into ECG representation learning. By leveraging ontology-guided masked autoregressive modeling, graph-smoothed contrastive learning, and multi-scale physiological statistical supervision, MAR-ECG achieves high-quality representations without requiring paired clinical text. The method softens supervision targets using graph distances and employs frozen linear probing for evaluation. It significantly outperforms strong baselines across five downstream tasks, demonstrating particularly robust performance in low-label regimes and matching the accuracy of state-of-the-art ECGโ€“text multimodal models.
๐Ÿ“ Abstract
The 12-lead electrocardiogram (ECG) is a quasi-periodic, multi-channel signal with diagnostic content spanning timescales from millisecond waveform morphology to multi-second rhythm dynamics. Existing ECG representation learning relies on signal-only self-supervision or ECG-text multimodal alignment, neither of which exploits the structured diagnostic codes attached to every clinical recording. We present \textbf{MAR-ECG}, an ontology-guided masked autoregressive framework that supervises the encoder with a curated 40-node SNOMED-CT cardiac graph through \emph{graph alignment}, eliminating the need for paired clinical reports. MAR-ECG combines two complementary objectives. First, \emph{graph-smoothed contrastive learning} (GSCL) anchors the encoder's rhythm-pooled features to the SNOMED graph, softening supervision targets by ontology distance so that clinically related concepts reinforce one another rather than function as hard negatives. Second, \emph{multi-scale physiological supervision} complements GSCL with signal-derived patch auxiliaries that target rhythm-physiology statistics extracted automatically from the input, extending supervision beyond the patch tier at no annotation cost. Pretrained on ${\sim}40$K publicly available 12-lead ECGs with SNOMED-CT codes and evaluated by frozen linear probing on five downstream classification benchmarks, MAR-ECG consistently outperforms a strong masked-autoregressive baseline, with mean gains in the low-label regime. Despite the absence of paired clinical text, MAR-ECG achieves performance competitive with state-of-the-art multimodal ECG-text methods.
Problem

Research questions and friction points this paper is trying to address.

ECG representation learning
structured diagnostic codes
SNOMED-CT ontology
multimodal alignment
clinical annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

ontology-guided learning
graph-smoothed contrastive learning
masked autoregressive ECG
multi-scale physiological supervision
SNOMED-CT
๐Ÿ”Ž Similar Papers
No similar papers found.