Algorithmic Capture, Computational Complexity, and Inductive Bias of Infinite Transformers

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work formalizes the notion of “algorithmic grokking,” distinguishing genuine algorithmic understanding by neural networks from mere statistical interpolation, and investigates the generalization capabilities and computational complexity constraints of Transformers across varying problem scales. By analyzing infinite-width Transformers under both lazy and rich training regimes and integrating computational complexity theory with the EPTHS (Efficient Polynomial-Time Heuristic Scheme) framework, the study establishes, for the first time, an upper bound on the computational complexity of functions learnable by such models. The findings reveal that despite their universal expressive power, Transformers exhibit inherent inductive biases favoring low-complexity algorithms—such as search, copy, and sorting—and struggle to generalize to higher-complexity tasks, thereby elucidating fundamental limitations and preferences in their algorithmic learning behavior.

Technology Category

Application Category

📝 Abstract

We formally define Algorithmic Capture (i.e., ``grokking'' an algorithm) as the ability of a neural network to generalize to arbitrary problem sizes ($T$) with controllable error and minimal sample adaptation, distinguishing true algorithmic learning from statistical interpolation. By analyzing infinite-width transformers in both the lazy and rich regimes, we derive upper bounds on the inference-time computational complexity of the functions these networks can learn. We show that despite their universal expressivity, transformers possess an inductive bias towards low-complexity algorithms within the Efficient Polynomial Time Heuristic Scheme (EPTHS) class. This bias effectively prevents them from capturing higher-complexity algorithms, while allowing success on simpler tasks like search, copy, and sort.

Problem

Research questions and friction points this paper is trying to address.

Algorithmic Capture

Computational Complexity

Inductive Bias

Transformers

Generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Algorithmic Capture

Computational Complexity

Inductive Bias