Characterizing the Expressivity of Transformer Language Models

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the lack of rigorous characterization of the expressive power of practical Transformer models—specifically those employing finite-precision arithmetic, soft attention, and strict causal masking. We establish, for the first time, an exact expressive equivalence between such models and the fragment of Linear Temporal Logic (LTL) containing only past-time operators. Methodologically, we integrate formal language theory, finite automata, algebraic linguistics, and LTL to achieve a bidirectional structural correspondence between Transformer architectures and logical formulas. Our key contributions are: (1) introducing a “realistic Transformer” model that abandons idealized assumptions (e.g., infinite precision, hard attention) in favor of engineering-relevant constraints; (2) unifying deep learning architectures, temporal logic, automata theory, and algebraic language theory within a single theoretical framework; and (3) providing rigorous proofs and empirical validation—demonstrating length-generalization on corresponding LTL languages and consistent failure on non-LTL (transfinite) languages.

Technology Category

Application Category

📝 Abstract

Transformer-based language models (LMs) have achieved widespread empirical success, but their theoretical expressive power remains only partially understood. Prior work often relies on idealized models with assumptions -- such as arbitrary numerical precision and hard attention -- that diverge from real-world transformers. In this work, we provide an exact characterization of fixed-precision transformers with strict future masking and soft attention, an idealization that more closely mirrors practical implementations. We show that these models are precisely as expressive as a specific fragment of linear temporal logic that includes only a single temporal operator: the past operator. We further relate this logic to established classes in formal language theory, automata theory, and algebra, yielding a rich and unified theoretical framework for understanding transformer expressivity. Finally, we present empirical results that align closely with our theory: transformers trained on languages within their theoretical capacity generalize perfectly over lengths, while they consistently fail to generalize on languages beyond it.

Problem

Research questions and friction points this paper is trying to address.

Characterizing expressive power of real-world transformer language models

Relating transformer expressivity to linear temporal logic fragments

Empirical validation of theoretical transformer capacity limits

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exact characterization of fixed-precision transformers

Expressivity linked to linear temporal logic

Empirical validation aligns with theoretical framework

🔎 Similar Papers

Emergence of a High-Dimensional Abstraction Phase in Language Transformers