NoPE: The Counting Power of Transformers with No Positional Encodings

๐Ÿ“… 2025-05-16
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper investigates the counting expressiveness of Transformer models without positional encoding (NoPE) and employing average hard attention (AHAT). The central problem is whether such models can recognize languages corresponding to nonnegative integer solutions of Diophantine equations or fundamental counting properties like PARITY. Methodologically, the authors conduct a precise characterization of the expressive power of NoPE-AHAT Transformers via formal language theory and algebraic geometry. Their main result establishes that NoPE-AHAT exactly captures the class of semialgebraic set languagesโ€”i.e., sets of nonnegative integer solutions to finite systems of multivariate polynomial inequalities. This implies strict expressiveness superiority over counter machines and Petri nets, yet inability to recognize PARITY; moreover, associated decision problems are undecidable. Crucially, the authors construct the first counting language computable in TCโฐ that is provably inexpressible by *any* positional-encoding Transformer, thereby resolving a long-standing open problem.

Technology Category

Application Category

๐Ÿ“ Abstract
Positional Encodings (PEs) seem to be indispensable for ensuring expressiveness of transformers; without them attention transformers reduce to a bag-of-word model. NoPE-transformers (i.e. with No PEs) with unique hard attention mechanisms were very recently shown to only be able to express regular languages, i.e., with limited counting ability. This paper shows that, with average hard attention mechanisms, NoPE-transformers are still surprisingly expressive: they can express counting languages corresponding to nonnegative integer solutions to multivariate polynomial equations (i.e. Diophantine equations), reasoning about which is well-known to be undecidable. In fact, we provide a precise characterization of languages expressible by Average Hard Attention NoPE-Transformers (NoPE-AHATs): they correspond precisely to what we call emph{semi-algebraic sets}, i.e., finite unions of sets of nonnegative integer solutions to systems of multivariate polynomial inequations. We obtain several interesting consequences of our characterization. Firstly, NoPE-transformers can express counting properties that are far more complex than established models like simplified counter machines and Petri nets, but cannot express a very simple counting property of PARITY. Secondly, the problem of analyzing NoPE-transformers is undecidable, e.g., whether a given NoPE transformer classifies all input strings in one class. To complement our results, we exhibit a counting language that is not expressible by average hard attention transformers even with arbitrary PEs but is expressible in the circuit complexity class TC$^0$, answering an open problem.
Problem

Research questions and friction points this paper is trying to address.

Characterizing expressiveness of NoPE-transformers with average hard attention
Comparing NoPE-transformers' counting ability to established models
Determining decidability of analyzing NoPE-transformers' classification behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

NoPE-transformers use average hard attention mechanisms
Express counting languages via Diophantine equations
Characterize languages as semi-algebraic sets
๐Ÿ”Ž Similar Papers
No similar papers found.