π€ AI Summary
This work addresses the lack of theoretical guarantees for length generalization in Transformers when both sequence length and vocabulary size scale simultaneously in planning verification tasks. We propose the C*-RASP theoretical framework, which provides the first provable length generalization guarantee for decoder-only Transformers under this setting. By leveraging structural properties from classical AI planning domains, we characterize a class of planning domains that Transformers can reliably verify and identify key structural features governing their generalization capability. Our theoretical findings are empirically validated, demonstrating a strong correlation between the identified structural attributes and model performance.
π Abstract
Transformers have shown inconsistent success in AI planning tasks, and theoretical understanding of when generalization should be expected has been limited. We take important steps towards addressing this gap by analyzing the ability of decoder-only models to verify whether a given plan correctly solves a given planning instance. To analyse the general setting where the number of objects -- and thus the effective input alphabet -- grows at test time, we introduce C*-RASP, an extension of C-RASP designed to establish length generalization guarantees for transformers under the simultaneous growth in sequence length and vocabulary size. Our results identify a large class of classical planning domains for which transformers can provably learn to verify long plans, and structural properties that significantly affects the learnability of length generalizable solutions. Empirical experiments corroborate our theory.