Concise One-Layer Transformers Can Do Function Evaluation (Sometimes)

📅 2025-03-28

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work investigates the expressive power of Transformer models for computing arbitrary functions $f:[n] o[n]$. Focusing on how input encoding affects depth requirements, we establish—via rigorous constructive proof—that a single-layer Transformer suffices to evaluate any such function under a carefully designed input representation, whereas at least two layers are necessary under more natural positional encodings. We further derive a polylogarithmic constraint linking expressive capacity to parameter complexity (number of attention heads × embedding dimension × numerical precision). Our methodology integrates constructive circuit simulation, formal modeling of attention-based computability, discrete function encoding design, and controlled-scale empirical validation. Theoretically, we precisely characterize the functional evaluation boundary between one- and two-layer Transformers; empirically, we demonstrate strong alignment between theoretical computability and practical learnability across diverse function classes.

Technology Category

Application Category

📝 Abstract

While transformers have proven enormously successful in a range of tasks, their fundamental properties as models of computation are not well understood. This paper contributes to the study of the expressive capacity of transformers, focusing on their ability to perform the fundamental computational task of evaluating an arbitrary function from $[n]$ to $[n]$ at a given argument. We prove that concise 1-layer transformers (i.e., with a polylog bound on the product of the number of heads, the embedding dimension, and precision) are capable of doing this task under some representations of the input, but not when the function's inputs and values are only encoded in different input positions. Concise 2-layer transformers can perform the task even with the more difficult input representation. Experimentally, we find a rough alignment between what we have proven can be computed by concise transformers and what can be practically learned.

Problem

Research questions and friction points this paper is trying to address.

Study transformers' ability to evaluate arbitrary functions

Analyze concise 1-layer transformers' computational limits

Compare theoretical and practical learning in transformers

Innovation

Methods, ideas, or system contributions that make the work stand out.

1-layer transformers evaluate functions concisely

Input representation affects function evaluation

2-layer transformers handle difficult representations

🔎 Similar Papers

No similar papers found.