Circuit Complexity From Physical Constraints: Scaling Limitations of Attention

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

257K/year

🤖 AI Summary

Classical circuit complexity classes (e.g., NC, AC, TC) fail to characterize the physical realizability and scalability of Transformer attention mechanisms. Method: We introduce the novel notion of *local consistency* and define the RC(·) complexity class—tailored to physically implementable circuits—thereby formally incorporating hardware constraints such as wiring density and signal propagation delay into circuit complexity analysis. By establishing a quantitative lower bound on attention computation time in terms of input data entropy growth, we analyze the fundamental trade-off between runtime efficiency and data distribution complexity. Contribution/Results: We prove that any attention mechanism requiring ω(n^{3/2}) time cannot sustainably adapt to increasingly high-entropy datasets. This work exposes an inherent limitation of traditional complexity theory in distinguishing model expressivity under physical constraints, and provides the first rigorous, physics-grounded theoretical bound on the hardware scalability of Transformer architectures.

Technology Category

Application Category

📝 Abstract

We argue that the standard circuit complexity measures derived from $NC, AC, TC$ provide limited practical information and are now insufficient to further differentiate model expressivity. To address these new limitations, we define a novel notion of local uniformity and a family of circuit complexity classes $RC(cdot)$ that capture the fundamental constraints of scaling physical circuits. Through the lens of $RC(cdot)$, we show that attention mechanisms with $omega(n^{3/2})$ runtime cannot scale to accommodate the entropy of increasingly complex datasets. Our results simultaneously provide a methodology for defining meaningful bounds on transformer expressivity and naturally expose the restricted viability of attention.

Problem

Research questions and friction points this paper is trying to address.

Standard circuit complexity measures provide limited practical information about models

Attention mechanisms with ω(n³⁄²) runtime cannot scale for complex datasets

Defining meaningful bounds on transformer expressivity and attention viability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Defining local uniformity and RC circuit complexity classes

Analyzing attention mechanisms with ω(n^3/2) runtime limitations

Establishing methodology for transformer expressivity bounds

🔎 Similar Papers

What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?