🤖 AI Summary
This work addresses the challenge that existing generative models struggle to simultaneously preserve the authentic waveform morphology of photoplethysmography (PPG) signals and learn latent representations that reflect cardiopulmonary physiological structure, while also lacking an effective inference pathway from signal to latent space. To overcome this, the authors propose VAMP-Diff, a jointly trained variational diffusion model that integrates a temporal PPG encoder, a conditional one-dimensional diffusion decoder, and VampPrior regularization to model a compact pooled latent space and leverage full temporal latent information for diffusion-based reconstruction. By replacing the fixed Gaussian prior with VampPrior, the model achieves more flexible latent distribution modeling, significantly enhancing waveform fidelity and physiological consistency while maintaining an invertible inference path. Experiments on the CapnoBase dataset demonstrate that VAMP-Diff generates sharper PPG waveforms, effectively preserves heart and respiratory rate information, and exhibits sensitive reconstruction error responses to waveform distortions.
📝 Abstract
Photoplethysmography (PPG) has become a ubiquitous physiological signal; however, current generative models still struggle to preserve realistic waveform morphology and learn a latent structure that captures cardiac and respiratory physiology. PPG generators trained with adversarial losses can produce plausible waveforms, but provide no inference path from a real signal to a latent representation. Variational autoencoders, on the other hand, map the PPG data to latent codes, although their decoders often blur systolic upstrokes and dampen amplitude and spectral details. Diffusion models improve waveform fidelity, but typically lack an inference path for reconstruction and physiological analysis. We propose VampPrior Latent Diffusion (VAMP-Diff), a jointly trained variational diffusion model that combines a temporal PPG encoder, a conditional one-dimensional diffusion decoder, and VampPrior regularization on a compact pooled latent. The model uses full temporal latent during diffusion reconstruction, giving the decoder access to beat timing and morphology while generating samples from learned VampPrior components instead of a fixed Gaussian prior. We demonstrate on the CapnoBase dataset that VAMP-Diff produces realistic PPG signals, reconstructs sharper physiological waveforms than Gaussian-prior baselines, preserves heart-rate information, maintains respiratory-rate consistency, and is sensitive to waveform corruptions through reconstruction error.