Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing deep generative models—such as VAEs and GANs—exhibit limited capability in modeling discrete observations or latent variables, and their optimization objectives are only indirectly related to the data log-likelihood. Method: This paper introduces the Joint Stochastic Approximation (JSA) autoencoder, which directly maximizes the data likelihood while minimizing the inclusive KL divergence (i.e., the reverse KL between the true posterior and the variational inference distribution), specifically designed for semi-supervised learning. Contribution/Results: JSA is the first discrete latent variable model successfully applied to challenging semi-supervised benchmarks (MNIST/SVHN). It unifies continuous and discrete latent representations within a single framework and demonstrates robustness to encoder-decoder architectural mismatch. Empirical results show that discrete JSA achieves performance on par with state-of-the-art continuous-latent methods, validating the effectiveness and competitiveness of discrete representations in semi-supervised generative modeling.

Technology Category

Application Category

📝 Abstract
Our examination of existing deep generative models (DGMs), including VAEs and GANs, reveals two problems. First, their capability in handling discrete observations and latent codes is unsatisfactory, though there are interesting efforts. Second, both VAEs and GANs optimize some criteria that are indirectly related to the data likelihood. To address these problems, we formally present Joint-stochastic-approximation (JSA) autoencoders - a new family of algorithms for building deep directed generative models, with application to semi-supervised learning. The JSA learning algorithm directly maximizes the data log-likelihood and simultaneously minimizes the inclusive KL divergence the between the posteriori and the inference model. We provide theoretical results and conduct a series of experiments to show its superiority such as being robust to structure mismatch between encoder and decoder, consistent handling of both discrete and continuous variables. Particularly we empirically show that JSA autoencoders with discrete latent space achieve comparable performance to other state-of-the-art DGMs with continuous latent space in semi-supervised tasks over the widely adopted datasets - MNIST and SVHN. To the best of our knowledge, this is the first demonstration that discrete latent variable models are successfully applied in the challenging semi-supervised tasks.
Problem

Research questions and friction points this paper is trying to address.

Improving handling of discrete observations and latent codes in DGMs
Directly maximizing data log-likelihood in generative models
Enhancing semi-supervised learning with discrete latent variable models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximizes data log-likelihood directly
Minimizes inclusive KL divergence effectively
Handles discrete and continuous variables consistently
🔎 Similar Papers
No similar papers found.