Variational decomposition autoencoding improves disentanglement of latent representations

📅 2026-01-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the challenge of learning disentangled and interpretable latent representations from complex, non-stationary, high-dimensional time-varying signals, which exhibit rich time-frequency structures that conventional variational autoencoders (VAEs) struggle to model effectively. To this end, we propose the Decompositional Variational Autoencoder (DecVAE), which uniquely integrates signal decomposition priors directly into the VAE framework. DecVAE employs an encoder-only architecture and jointly leverages signal decomposition models, contrastive self-supervised tasks, and variational inference to learn multi-subspace latent representations aligned with the intrinsic time-frequency characteristics of the data. Extensive experiments on synthetic data and three scientific datasets demonstrate that DecVAE substantially outperforms existing VAE approaches, achieving significant improvements in disentanglement quality, cross-task generalization, and interpretability of the learned latent representations.

Technology Category

Application Category

📝 Abstract

Understanding the structure of complex, nonstationary, high-dimensional time-evolving signals is a central challenge in scientific data analysis. In many domains, such as speech and biomedical signal processing, the ability to learn disentangled and interpretable representations is critical for uncovering latent generative mechanisms. Traditional approaches to unsupervised representation learning, including variational autoencoders (VAEs), often struggle to capture the temporal and spectral diversity inherent in such data. Here we introduce variational decomposition autoencoding (VDA), a framework that extends VAEs by incorporating a strong structural bias toward signal decomposition. VDA is instantiated through variational decomposition autoencoders (DecVAEs), i.e., encoder-only neural networks that combine a signal decomposition model, a contrastive self-supervised task, and variational prior approximation to learn multiple latent subspaces aligned with time-frequency characteristics. We demonstrate the effectiveness of DecVAEs on simulated data and three publicly available scientific datasets, spanning speech recognition, dysarthria severity evaluation, and emotional speech classification. Our results demonstrate that DecVAEs surpass state-of-the-art VAE-based methods in terms of disentanglement quality, generalization across tasks, and the interpretability of latent encodings. These findings suggest that decomposition-aware architectures can serve as robust tools for extracting structured representations from dynamic signals, with potential applications in clinical diagnostics, human-computer interaction, and adaptive neurotechnologies.

Problem

Research questions and friction points this paper is trying to address.

disentangled representation

nonstationary signals

time-evolving signals

latent generative mechanisms

high-dimensional data

Innovation

Methods, ideas, or system contributions that make the work stand out.

variational decomposition autoencoding

disentangled representation

signal decomposition