🤖 AI Summary
Existing deep generative models—such as VAEs and GANs—exhibit limited capability in modeling discrete observations or latent variables, and their optimization objectives are only indirectly related to the data log-likelihood.
Method: This paper introduces the Joint Stochastic Approximation (JSA) autoencoder, which directly maximizes the data likelihood while minimizing the inclusive KL divergence (i.e., the reverse KL between the true posterior and the variational inference distribution), specifically designed for semi-supervised learning.
Contribution/Results: JSA is the first discrete latent variable model successfully applied to challenging semi-supervised benchmarks (MNIST/SVHN). It unifies continuous and discrete latent representations within a single framework and demonstrates robustness to encoder-decoder architectural mismatch. Empirical results show that discrete JSA achieves performance on par with state-of-the-art continuous-latent methods, validating the effectiveness and competitiveness of discrete representations in semi-supervised generative modeling.
📝 Abstract
Our examination of existing deep generative models (DGMs), including VAEs and GANs, reveals two problems. First, their capability in handling discrete observations and latent codes is unsatisfactory, though there are interesting efforts. Second, both VAEs and GANs optimize some criteria that are indirectly related to the data likelihood. To address these problems, we formally present Joint-stochastic-approximation (JSA) autoencoders - a new family of algorithms for building deep directed generative models, with application to semi-supervised learning. The JSA learning algorithm directly maximizes the data log-likelihood and simultaneously minimizes the inclusive KL divergence the between the posteriori and the inference model. We provide theoretical results and conduct a series of experiments to show its superiority such as being robust to structure mismatch between encoder and decoder, consistent handling of both discrete and continuous variables. Particularly we empirically show that JSA autoencoders with discrete latent space achieve comparable performance to other state-of-the-art DGMs with continuous latent space in semi-supervised tasks over the widely adopted datasets - MNIST and SVHN. To the best of our knowledge, this is the first demonstration that discrete latent variable models are successfully applied in the challenging semi-supervised tasks.