SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation

📅 2025-10-12

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

To address the scarcity of expert-annotated data hindering supervised learning in phonocardiogram (PCG) analysis, this paper proposes a self-supervised PCG representation learning framework based on dual-path contrastive learning. It jointly models one-dimensional time-domain waveforms and two-dimensional time-frequency spectrograms, integrating a hybrid contrastive loss with prototype-guided metric learning to achieve robust and generalizable multimodal feature representations. Evaluated on four PCG classification benchmarks, the method achieves state-of-the-art performance while requiring only one-third of the labeled data needed by fully supervised models. Moreover, it successfully transfers to lung sound classification and heart rate estimation tasks, demonstrating strong cross-task generalization. The core innovation lies in the first integration of dual-path contrastive learning with prototype-driven metric learning, significantly enhancing the discriminability and clinical applicability of unsupervised PCG representations.

Technology Category

Application Category

📝 Abstract

The automated analysis of phonocardiograms is vital for the early diagnosis of cardiovascular disease, yet supervised deep learning is often constrained by the scarcity of expert-annotated data. In this paper, we propose the Self-Supervised Dual-Path Prototypical Network (SS-DPPN), a foundation model for cardiac audio representation and classification from unlabeled data. The framework introduces a dual-path contrastive learning based architecture that simultaneously processes 1D waveforms and 2D spectrograms using a novel hybrid loss. For the downstream task, a metric-learning approach using a Prototypical Network was used that enhances sensitivity and produces well-calibrated and trustworthy predictions. SS-DPPN achieves state-of-the-art performance on four cardiac audio benchmarks. The framework demonstrates exceptional data efficiency with a fully supervised model on three-fold reduction in labeled data. Finally, the learned representations generalize successfully across lung sound classification and heart rate estimation. Our experiments and findings validate SS-DPPN as a robust, reliable, and scalable foundation model for physiological signals.

Problem

Research questions and friction points this paper is trying to address.

Automating phonocardiogram analysis with limited expert-annotated data

Learning cardiac audio representations from unlabeled data efficiently

Generalizing learned representations to other physiological signal tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised dual-path model processes waveforms and spectrograms

Uses hybrid contrastive loss for cardiac audio representation

Prototypical network enables trustworthy predictions with metric learning

🔎 Similar Papers

Classification of Heart Sounds Using Multi-Branch Deep Convolutional Network and LSTM-CNN