Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Transformers exhibit weak systematic and compositional out-of-distribution (OOD) generalization—particularly hindering performance on complex reasoning tasks. Method: We propose a recursive latent-space reasoning framework built upon standard Transformer architecture, integrating four novel mechanisms: (1) input-adaptive recurrent structure, (2) algorithm-level supervision signals, (3) discrete bottleneck-constrained latent representations, and (4) an explicit error-feedback-driven correction module. The resulting model is modular, scalable, and interpretable. Contributions/Results: Evaluated on compositional arithmetic reasoning tasks (e.g., GSM8K-style problems) over computational graphs, our approach achieves significant gains in OOD generalization. Interpretability analysis confirms that the mechanisms jointly induce robust, structured reasoning paths—demonstrating improved logical compositionality. This work establishes a new paradigm for enhancing the logical generalization capability of large language models.

Technology Category

Application Category

📝 Abstract
Systematic, compositional generalization beyond the training distribution remains a core challenge in machine learning -- and a critical bottleneck for the emergent reasoning abilities of modern language models. This work investigates out-of-distribution (OOD) generalization in Transformer networks using a GSM8K-style modular arithmetic on computational graphs task as a testbed. We introduce and explore a set of four architectural mechanisms aimed at enhancing OOD generalization: (i) input-adaptive recurrence; (ii) algorithmic supervision; (iii) anchored latent representations via a discrete bottleneck; and (iv) an explicit error-correction mechanism. Collectively, these mechanisms yield an architectural approach for native and scalable latent space reasoning in Transformer networks with robust algorithmic generalization capabilities. We complement these empirical results with a detailed mechanistic interpretability analysis that reveals how these mechanisms give rise to robust OOD generalization abilities.
Problem

Research questions and friction points this paper is trying to address.

Enhancing out-of-distribution generalization in Transformers
Addressing compositional generalization beyond training data
Improving algorithmic reasoning capabilities in language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Input-adaptive recurrence enhances generalization
Algorithmic supervision improves reasoning capabilities
Discrete bottleneck anchors latent representations
🔎 Similar Papers
No similar papers found.
A
Awni Altabaa
Department of Statistics & Data Science, Yale University
S
Siyu Chen
Department of Statistics & Data Science, Yale University
John Lafferty
John Lafferty
Yale University
Machine Learning
Zhuoran Yang
Zhuoran Yang
Yale University
machine learningoptimizationreinforcement learningstatistics