Prior Learning in Introspective VAEs

📅 2024-08-25

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses mode collapse and limited representational capacity in Soft-IntroVAE (S-IntroVAE) caused by fixed priors. We propose a learnable multimodal prior framework, treating the prior as a “third participant” jointly optimized with the encoder and decoder under a shared Nash equilibrium adversarial training scheme. Building upon a modified ELBO, we derive adaptive variance clipping and responsibility regularization—provably balancing prior diversity and faithful latent variable assignment. Experiments on 2D density estimation and benchmark datasets (MNIST, Fashion-MNIST, CIFAR-10) demonstrate significant improvements in sample quality, log-likelihood scores, and semantic consistency of the latent space. To our knowledge, this is the first approach to jointly optimize prior structure, learning mechanism, and representation performance. Our results empirically validate that learnable multimodal priors yield dual benefits for both generative modeling and representation learning.

Technology Category

Application Category

📝 Abstract

Variational Autoencoders (VAEs) are a popular framework for unsupervised learning and data generation. A plethora of methods have been proposed focusing on improving VAEs, with the incorporation of adversarial objectives and the integration of prior learning mechanisms being prominent directions. When it comes to the former, an indicative instance is the recently introduced family of Introspective VAEs aiming at ensuring that a low likelihood is assigned to unrealistic samples. In this study, we focus on the Soft-IntroVAE (S-IntroVAE) and investigate the implication of incorporating a multimodal and learnable prior into this framework. Namely, we formulate the prior as a third player and show that when trained in cooperation with the decoder constitutes an effective way for prior learning, which shares the Nash Equilibrium with the vanilla S-IntroVAE. Furthermore, based on a modified formulation of the optimal ELBO in S-IntroVAE, we develop theoretically motivated regularizations, that is (i) adaptive variance clipping to stabilize training when learning the prior and (ii) responsibility regularization to discourage the formation of inactive prior mode. Finally, we perform a series of targeted experiments on a 2D density estimation benchmark and in an image generation setting comprised of the (F)-MNIST and CIFAR-10 datasets demonstrating the benefit of prior learning in S-IntroVAE in generation and representation learning.

Problem

Research questions and friction points this paper is trying to address.

Investigating multimodal learnable prior in Soft-IntroVAE

Developing regularizations for stable prior learning

Demonstrating benefits of prior learning in generation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal learnable prior in Soft-IntroVAE

Adaptive variance clipping stabilizes training

Responsibility regularization prevents inactive modes

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)