Non-Asymptotic Length Generalization

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the non-progressive length generalization problem by proposing the first computable theoretical framework to determine the minimal training sequence length—termed *length complexity*—required to guarantee generalization to longer inputs. Methodologically, it integrates formal language theory, computational complexity analysis, and the Minimum Complexity Interpolation (MCI) algorithm to construct the C-RASP model and establish a deep equivalence between C-RASP expressivity and language equivalence checking. Key contributions include: (i) a tight bound of (2n-2) on the length complexity of deterministic finite automata (DFA); (ii) upper bounds of (O(T^2)) for one-layer C-RASP and (O(T^{O(K)})) for two-layer C-RASP, where (T) is input length and (K) is grammar constant; and (iii) the first proof that context-free grammars admit no computable upper bound on length complexity, thereby establishing a fundamental theoretical limit on length generalization.

Technology Category

Application Category

📝 Abstract
Length generalization is the ability of a learning algorithm to learn a hypothesis which generalizes to longer inputs than the inputs in the training set. In this paper, we provide provable guarantees of length generalization for various classes of functions in an idealized setting. First, we formalize the framework of non-asymptotic length generalization, which requires a computable upper bound for the minimum input length that guarantees length generalization, as a function of the complexity of ground-truth function under some given complexity measure. We refer to this minimum input length to length generalize as length complexity. We show the Minimum-Complexity Interpolator learning algorithm achieves optimal length complexity. We further show that whether a function class admits non-asymptotic length generalization is equivalent to the decidability of its language equivalence problem, which implies that there is no computable upper bound for the length complexity of Context-Free Grammars. On the positive side, we show that the length complexity of Deterministic Finite Automata is $2n - 2$ where $n$ is the number of states of the ground-truth automaton. Our main results are upper bounds of length complexity for a subset of a transformer-related function class called C-RASP (Yang&Chiang, 2024). We show that the length complexity of 1-layer C-RASP functions is $O(T^2)$ when the ground-truth function has precision $T$, and that the length complexity of 2-layer C-RASP functions is $O(T^{O(K)})$ when the ground-truth function has precision $T$ and $K$ heads.
Problem

Research questions and friction points this paper is trying to address.

Proving length generalization guarantees for function classes
Determining length complexity for learning algorithms
Establishing bounds for transformer-related C-RASP functions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-asymptotic length generalization framework
Optimal length complexity with Minimum-Complexity Interpolator
Length complexity bounds for C-RASP functions
🔎 Similar Papers
Thomas Chen
Thomas Chen
University of Texas at Austin
AnalysisDeep LearningMathematical Physics
T
Tengyu Ma
Stanford University
Z
Zhiyuan Li
Toyota Technological Institute at Chicago