🤖 AI Summary
Classical circuit complexity classes (e.g., NC, AC, TC) fail to characterize the physical realizability and scalability of Transformer attention mechanisms.
Method: We introduce the novel notion of *local consistency* and define the RC(·) complexity class—tailored to physically implementable circuits—thereby formally incorporating hardware constraints such as wiring density and signal propagation delay into circuit complexity analysis. By establishing a quantitative lower bound on attention computation time in terms of input data entropy growth, we analyze the fundamental trade-off between runtime efficiency and data distribution complexity.
Contribution/Results: We prove that any attention mechanism requiring ω(n^{3/2}) time cannot sustainably adapt to increasingly high-entropy datasets. This work exposes an inherent limitation of traditional complexity theory in distinguishing model expressivity under physical constraints, and provides the first rigorous, physics-grounded theoretical bound on the hardware scalability of Transformer architectures.
📝 Abstract
We argue that the standard circuit complexity measures derived from $NC, AC, TC$ provide limited practical information and are now insufficient to further differentiate model expressivity. To address these new limitations, we define a novel notion of local uniformity and a family of circuit complexity classes $RC(cdot)$ that capture the fundamental constraints of scaling physical circuits. Through the lens of $RC(cdot)$, we show that attention mechanisms with $omega(n^{3/2})$ runtime cannot scale to accommodate the entropy of increasingly complex datasets. Our results simultaneously provide a methodology for defining meaningful bounds on transformer expressivity and naturally expose the restricted viability of attention.