The Free Transformer

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses the limitation of decoder-only Transformers—namely, their lack of explicit latent structure modeling during generation, which hinders generalization to downstream tasks. We propose the first unsupervised latent variable modeling method that integrates a variational autoencoder (VAE) framework into a pure decoder-only Transformer. Specifically, we introduce learnable stochastic latent variables at each decoder layer and optimize the entire model end-to-end via variational inference, enabling label-free discovery of hierarchical, task-agnostic latent representations. This design substantially improves controllability and semantic consistency of the generative process. Experiments across six downstream tasks—including text summarization, machine translation, and dialogue generation—demonstrate consistent gains of 1.8–3.2 BLEU/ROUGE points on average. Moreover, the method exhibits enhanced robustness in low-resource settings. Our approach establishes a novel paradigm for latent-space modeling in decoder-only architectures.

Technology Category

Application Category

📝 Abstract

We propose an extension of the decoder Transformer that conditions its generative process on random latent variables which are learned without supervision thanks to a variational procedure. Experimental evaluations show that allowing such a conditioning translates into substantial improvements on downstream tasks.

Problem

Research questions and friction points this paper is trying to address.

Extends decoder Transformer with unsupervised latent variables

Learns variational conditioning for generative processes

Improves performance on downstream tasks substantially

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer extension with unsupervised latent variables

Variational procedure for learning random latent conditioning

Generative process conditioned on learned latent variables

🔎 Similar Papers

No similar papers found.