Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers

📅 2025-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
It remains unclear whether Transformers achieve compositional generalization through rule-based reasoning or merely via memorization of specific instances. Method: We introduce a complexity-control framework grounded in low-complexity bias as the key mechanism enabling reasoning-driven generalization. Leveraging information-path masking, multi-dimensional complexity metrics, and cross-modal interpretability analysis (spanning image generation and NLP), we establish, for the first time, a theoretical link between complexity modulation and neuronal condensation. Our strategy explicitly steers models toward learning transferable primitive rules rather than superficial patterns. Contribution/Results: Evaluated across multiple real-world datasets, our approach consistently enhances out-of-distribution compositional generalization performance. It provides a novel paradigm for understanding the generalization mechanisms of large language models, bridging formal complexity theory with empirical neural behavior.

Technology Category

Application Category

📝 Abstract
Transformers have demonstrated impressive capabilities across various tasks, yet their performance on compositional problems remains a subject of debate. In this study, we investigate the internal mechanisms underlying Transformers' behavior in compositional tasks. We find that complexity control strategies significantly influence whether the model learns primitive-level rules that generalize out-of-distribution (reasoning-based solutions) or relies solely on memorized mappings (memory-based solutions). By applying masking strategies to the model's information circuits and employing multiple complexity metrics, we reveal distinct internal working mechanisms associated with different solution types. Further analysis reveals that reasoning-based solutions exhibit a lower complexity bias, which aligns with the well-studied neuron condensation phenomenon. This lower complexity bias is hypothesized to be the key factor enabling these solutions to learn reasoning rules. We validate these conclusions across multiple real-world datasets, including image generation and natural language processing tasks, confirming the broad applicability of our findings.
Problem

Research questions and friction points this paper is trying to address.

Transformer
Complex Problem Solving
Learning Mechanism
Innovation

Methods, ideas, or system contributions that make the work stand out.

Complexity Control
Neuronal Pruning Phenomenon
Transformer Optimization
🔎 Similar Papers
No similar papers found.
Zhongwang Zhang
Zhongwang Zhang
Shanghai Jiao Tong University
P
Pengxiao Lin
Institute of Natural Sciences, School of Mathematical Sciences, MOE-LSC, Shanghai Jiao Tong University, Shanghai, 200240, China
Z
Zhiwei Wang
Institute of Natural Sciences, School of Mathematical Sciences, MOE-LSC, Shanghai Jiao Tong University, Shanghai, 200240, China
Yaoyu Zhang
Yaoyu Zhang
Shanghai Jiao Tong University
Deep Learning Theory
Z
Zhi-Qin John Xu
Institute of Natural Sciences, School of Mathematical Sciences, MOE-LSC, Shanghai Jiao Tong University, Shanghai, 200240, China; School of Artificial Intelligence, Shanghai Jiao Tong University, Center for LLM, Institute for Advanced Algorithms Research, Shanghai Seres Information Technology Co., Ltd, Shanghai 200040, China