Self-supervised Learning of Echocardiographic Video Representations via Online Cluster Distillation

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cardiac ultrasound video representation learning faces challenges including subtle anatomical structures, complex temporal dynamics, and the lack of domain-adapted pre-trained models. Method: We propose DISCOVR, a dual-branch self-supervised framework featuring an online clustering distillation mechanism that dynamically transfers evolving anatomical semantic knowledge from an image encoder to a video encoder, jointly modeling temporal consistency and fine-grained spatial semantics. Our approach integrates clustering-based video encoding, high-fidelity image encoding, and semantic cluster distillation loss—avoiding aggressive augmentations that degrade clinically relevant features and accommodating low-PSNR ultrasound imagery. Contribution/Results: Evaluated across six datasets spanning fetal, pediatric, and adult populations, DISCOVR outperforms state-of-the-art video self-supervised methods and dedicated anomaly detection models under both zero-shot and linear probe protocols, while significantly improving downstream segmentation transfer performance.

Technology Category

Application Category

📝 Abstract
Self-supervised learning (SSL) has achieved major advances in natural images and video understanding, but challenges remain in domains like echocardiography (heart ultrasound) due to subtle anatomical structures, complex temporal dynamics, and the current lack of domain-specific pre-trained models. Existing SSL approaches such as contrastive, masked modeling, and clustering-based methods struggle with high intersample similarity, sensitivity to low PSNR inputs common in ultrasound, or aggressive augmentations that distort clinically relevant features. We present DISCOVR (Distilled Image Supervision for Cross Modal Video Representation), a self-supervised dual branch framework for cardiac ultrasound video representation learning. DISCOVR combines a clustering-based video encoder that models temporal dynamics with an online image encoder that extracts fine-grained spatial semantics. These branches are connected through a semantic cluster distillation loss that transfers anatomical knowledge from the evolving image encoder to the video encoder, enabling temporally coherent representations enriched with fine-grained semantic understanding. Evaluated on six echocardiography datasets spanning fetal, pediatric, and adult populations, DISCOVR outperforms both specialized video anomaly detection methods and state-of-the-art video-SSL baselines in zero-shot and linear probing setups, and achieves superior segmentation transfer.
Problem

Research questions and friction points this paper is trying to address.

Self-supervised learning for echocardiographic video analysis
Overcoming challenges in ultrasound data with high intersample similarity
Improving temporal and spatial feature extraction in cardiac ultrasound
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised dual branch framework for echocardiography
Cluster distillation loss transfers anatomical knowledge
Combines temporal dynamics with spatial semantics
🔎 Similar Papers
No similar papers found.
Divyanshu Mishra
Divyanshu Mishra
DPhil Student at University of Oxford
Video UnderstandingVideo SSLMedical Image AnalysisMulti-Modal LearningUltrasound
M
Mohammadreza Salehi
University of Amsterdam
Pramit Saha
Pramit Saha
Department of Engineering Science, University of Oxford
Deep LearningFederated LearningMultimodal LearningComputer VisionMedical Image Analysis
O
O. Patey
Nuffield Department of Women’s and Reproductive Health, University of Oxford
A
Aris T. Papageorghiou
Nuffield Department of Women’s and Reproductive Health, University of Oxford
Yuki M. Asano
Yuki M. Asano
Full Professor, Head of FunAI Lab, University of Technology Nuremberg
Deep LearningMultimodal LearningSelf-supervised LearningLarge Model AdaptationLLMs
J
J. A. Noble
Department of Engineering Science, University of Oxford