🤖 AI Summary
This work investigates inherent limitations of pretrained large language models (LLMs) in sequence-length generalization, focusing on systematic asymmetries between *inductive* (rightward) and *anti-inductive* (leftward) capabilities in retrieval and copying tasks. Using C-RASP formalization, mechanistic interpretability analysis, a custom length-generalization benchmark, and controlled fine-tuning experiments, we empirically establish— for the first time—that mainstream Transformer LLMs exhibit a consistent inductive bias: they generalize robustly to longer sequences in rightward patterns but fail on leftward ones. This asymmetry stems from structural disparities in underlying circuit strengths and persists despite large-scale pretraining. Crucially, we demonstrate that the bias can be reliably corrected via lightweight, theoretically grounded fine-tuning. Our findings expose a fundamental length-generalization bottleneck of Transformers with concrete reliability implications for real-world tasks, providing critical empirical guidance for model architecture selection, alignment design, and generalization enhancement.
📝 Abstract
Transformers have theoretical limitations in modeling certain sequence-to-sequence tasks, yet it remains largely unclear if these limitations play a role in large-scale pretrained LLMs, or whether LLMs might effectively overcome these constraints in practice due to the scale of both the models themselves and their pretraining data. We explore how these architectural constraints manifest after pretraining, by studying a family of $ extit{retrieval}$ and $ extit{copying}$ tasks inspired by Liu et al. [2024]. We use the recently proposed C-RASP framework for studying length generalization [Huang et al., 2025b] to provide guarantees for each of our settings. Empirically, we observe an $ extit{induction-versus-anti-induction}$ asymmetry, where pretrained models are better at retrieving tokens to the right (induction) rather than the left (anti-induction) of a query token. This asymmetry disappears upon targeted fine-tuning if length-generalization is guaranteed by theory. Mechanistic analysis reveals that this asymmetry is connected to the differences in the strength of induction versus anti-induction circuits within pretrained Transformers. We validate our findings through practical experiments on real-world tasks demonstrating reliability risks. Our results highlight that pretraining selectively enhances certain Transformer capabilities, but does not overcome fundamental length-generalization limits.