Transformers Simulate MLE for Sequence Generation in Bayesian Networks

πŸ“… 2025-01-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper investigates whether Transformers can perform probabilistic sequence generation within a Bayesian network framework. Method: For sequential data such as natural language, we propose a lightweight attention architecture that theoretically constructs a Transformer capable of exactly implementing maximum likelihood estimation (MLE) for discrete Bayesian networks, and prove its ability to autoregressively generate new samples conforming to the network’s conditional independence structure given context. Contribution/Results: We provide the first theoretical proof that Transformers can strictly simulate both conditional probability inference and sampling in arbitrary discrete Bayesian networks. Empirical results demonstrate that the model is efficiently trainable and generates samples whose distribution closely approximates the true Bayesian network. This work reveals that large language models are fundamentally universal probabilistic sequence generators, offering a novel probabilistic graphical model perspective for understanding their generative mechanisms.

Technology Category

Application Category

πŸ“ Abstract
Transformers have achieved significant success in various fields, notably excelling in tasks involving sequential data like natural language processing. Despite these achievements, the theoretical understanding of transformers' capabilities remains limited. In this paper, we investigate the theoretical capabilities of transformers to autoregressively generate sequences in Bayesian networks based on in-context maximum likelihood estimation (MLE). Specifically, we consider a setting where a context is formed by a set of independent sequences generated according to a Bayesian network. We demonstrate that there exists a simple transformer model that can (i) estimate the conditional probabilities of the Bayesian network according to the context, and (ii) autoregressively generate a new sample according to the Bayesian network with estimated conditional probabilities. We further demonstrate in extensive experiments that such a transformer does not only exist in theory, but can also be effectively obtained through training. Our analysis highlights the potential of transformers to learn complex probabilistic models and contributes to a better understanding of large language models as a powerful class of sequence generators.
Problem

Research questions and friction points this paper is trying to address.

Transformers
Bayesian Networks
Sequence Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformers
Bayesian Networks
Sequential Data
πŸ”Ž Similar Papers
No similar papers found.