SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation

πŸ“… 2025-10-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the scarcity of expert-annotated data hindering supervised learning in phonocardiogram (PCG) analysis, this paper proposes a self-supervised PCG representation learning framework based on dual-path contrastive learning. It jointly models one-dimensional time-domain waveforms and two-dimensional time-frequency spectrograms, integrating a hybrid contrastive loss with prototype-guided metric learning to achieve robust and generalizable multimodal feature representations. Evaluated on four PCG classification benchmarks, the method achieves state-of-the-art performance while requiring only one-third of the labeled data needed by fully supervised models. Moreover, it successfully transfers to lung sound classification and heart rate estimation tasks, demonstrating strong cross-task generalization. The core innovation lies in the first integration of dual-path contrastive learning with prototype-driven metric learning, significantly enhancing the discriminability and clinical applicability of unsupervised PCG representations.

Technology Category

Application Category

πŸ“ Abstract
The automated analysis of phonocardiograms is vital for the early diagnosis of cardiovascular disease, yet supervised deep learning is often constrained by the scarcity of expert-annotated data. In this paper, we propose the Self-Supervised Dual-Path Prototypical Network (SS-DPPN), a foundation model for cardiac audio representation and classification from unlabeled data. The framework introduces a dual-path contrastive learning based architecture that simultaneously processes 1D waveforms and 2D spectrograms using a novel hybrid loss. For the downstream task, a metric-learning approach using a Prototypical Network was used that enhances sensitivity and produces well-calibrated and trustworthy predictions. SS-DPPN achieves state-of-the-art performance on four cardiac audio benchmarks. The framework demonstrates exceptional data efficiency with a fully supervised model on three-fold reduction in labeled data. Finally, the learned representations generalize successfully across lung sound classification and heart rate estimation. Our experiments and findings validate SS-DPPN as a robust, reliable, and scalable foundation model for physiological signals.
Problem

Research questions and friction points this paper is trying to address.

Automating phonocardiogram analysis with limited expert-annotated data
Learning cardiac audio representations from unlabeled data efficiently
Generalizing learned representations to other physiological signal tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised dual-path model processes waveforms and spectrograms
Uses hybrid contrastive loss for cardiac audio representation
Prototypical network enables trustworthy predictions with metric learning
πŸ”Ž Similar Papers
No similar papers found.
U
Ummy Maria Muna
Department of Computer Science and Engineering, BRAC University, Merul Badda, Dhaka, 1212, Bangladesh.
M
Md Mehedi Hasan Shawon
Department of Electricial and Electronic Engineering, BRAC University, Merul Badda, Dhaka, 1212, Bangladesh.
Md Jobayer
Md Jobayer
MSc Candidate in Biomedical Engineering, LinkΓΆping University
biomedical engineeringsignal processingdeep learningcomputer vision
S
Sumaiya Akter
Department of Electricial and Electronic Engineering, BRAC University, Merul Badda, Dhaka, 1212, Bangladesh.;Department of Electrical and Computer Engineering, University of Maryland, Paint Branch Drive, College Park, 20742, Maryland, United States.
Md Rakibul Hasan
Md Rakibul Hasan
PhD Candidate (Computing) at Curtin University || Senior Lecturer (on leave) at BRAC University
natural language processingdeep learning
M
Md. Golam Rabiul Alam
Department of Computer Science and Engineering, BRAC University, Merul Badda, Dhaka, 1212, Bangladesh.