🤖 AI Summary
Standard variational inference (VI) suffers from poor posterior robustness in heavy-tailed latent variable modeling due to its neglect of distributional geometric structure. To address this, we propose a Coupled Variational Inference (CVI) framework: leveraging the coupled exponential family, we formulate a coupled free energy functional and a coupled Fisher information metric that explicitly capture the intrinsic curvature geometry of heavy-tailed distributions—such as generalized Pareto and Student’s *t*. We further design a curvature-aware gradient optimization algorithm and a corrected mean-squared reconstruction loss. Implemented within a Coupled Variational Autoencoder (CVAE), our method achieves a 3% FID improvement over standard VAE on CelebA image reconstruction after only five training epochs, while significantly reducing outlier sample proportion. These results demonstrate CVI’s enhanced capability in modeling tail sensitivity and its superior training robustness.
📝 Abstract
We introduce an optimization framework for variational inference based on the coupled free energy, extending variational inference techniques to account for the curved geometry of the coupled exponential family. This family includes important heavy-tailed distributions such as the generalized Pareto and the Student's t. By leveraging the coupled free energy, which is equal to the coupled evidence lower bound (ELBO) of the inverted probabilities, we improve the accuracy and robustness of the learned model. The coupled generalization of Fisher Information metric and the affine connection. The method is applied to the design of a coupled variational autoencoder (CVAE). By using the coupling for both the distributions and cost functions, the reconstruction metric is derived to still be the mean-square average loss with modified constants. The novelty comes from sampling the heavy-tailed latent distribution with its associated coupled probability, which has faster decaying tails. The result is the ability to train a model with high penalties in the tails, while assuring that the training samples have a reduced number of outliers. The Wasserstein-2 or Fr'echet Inception Distance of the reconstructed CelebA images shows the CVAE has a 3% improvement over the VAE after 5 epochs of training.