🤖 AI Summary
This work addresses the non-progressive length generalization problem by proposing the first computable theoretical framework to determine the minimal training sequence length—termed *length complexity*—required to guarantee generalization to longer inputs. Methodologically, it integrates formal language theory, computational complexity analysis, and the Minimum Complexity Interpolation (MCI) algorithm to construct the C-RASP model and establish a deep equivalence between C-RASP expressivity and language equivalence checking. Key contributions include: (i) a tight bound of (2n-2) on the length complexity of deterministic finite automata (DFA); (ii) upper bounds of (O(T^2)) for one-layer C-RASP and (O(T^{O(K)})) for two-layer C-RASP, where (T) is input length and (K) is grammar constant; and (iii) the first proof that context-free grammars admit no computable upper bound on length complexity, thereby establishing a fundamental theoretical limit on length generalization.
📝 Abstract
Length generalization is the ability of a learning algorithm to learn a hypothesis which generalizes to longer inputs than the inputs in the training set. In this paper, we provide provable guarantees of length generalization for various classes of functions in an idealized setting. First, we formalize the framework of non-asymptotic length generalization, which requires a computable upper bound for the minimum input length that guarantees length generalization, as a function of the complexity of ground-truth function under some given complexity measure. We refer to this minimum input length to length generalize as length complexity. We show the Minimum-Complexity Interpolator learning algorithm achieves optimal length complexity. We further show that whether a function class admits non-asymptotic length generalization is equivalent to the decidability of its language equivalence problem, which implies that there is no computable upper bound for the length complexity of Context-Free Grammars. On the positive side, we show that the length complexity of Deterministic Finite Automata is $2n - 2$ where $n$ is the number of states of the ground-truth automaton. Our main results are upper bounds of length complexity for a subset of a transformer-related function class called C-RASP (Yang&Chiang, 2024). We show that the length complexity of 1-layer C-RASP functions is $O(T^2)$ when the ground-truth function has precision $T$, and that the length complexity of 2-layer C-RASP functions is $O(T^{O(K)})$ when the ground-truth function has precision $T$ and $K$ heads.