🤖 AI Summary
This work addresses the limitation of decoder-only Transformers—namely, their lack of explicit latent structure modeling during generation, which hinders generalization to downstream tasks. We propose the first unsupervised latent variable modeling method that integrates a variational autoencoder (VAE) framework into a pure decoder-only Transformer. Specifically, we introduce learnable stochastic latent variables at each decoder layer and optimize the entire model end-to-end via variational inference, enabling label-free discovery of hierarchical, task-agnostic latent representations. This design substantially improves controllability and semantic consistency of the generative process. Experiments across six downstream tasks—including text summarization, machine translation, and dialogue generation—demonstrate consistent gains of 1.8–3.2 BLEU/ROUGE points on average. Moreover, the method exhibits enhanced robustness in low-resource settings. Our approach establishes a novel paradigm for latent-space modeling in decoder-only architectures.
📝 Abstract
We propose an extension of the decoder Transformer that conditions its generative process on random latent variables which are learned without supervision thanks to a variational procedure. Experimental evaluations show that allowing such a conditioning translates into substantial improvements on downstream tasks.