🤖 AI Summary
This work addresses the limited systematic generalization of large language models (LLMs) in abstract spatial reasoning. We propose a compositional meta-learning framework specifically designed to enhance compositional generalization capabilities. To this end, we introduce SYGAR—the first benchmark explicitly targeting generalization over geometric transformation compositions—requiring models to generalize from known atomic transformations (e.g., translation, rotation) to unseen composite transformations (e.g., translation followed by rotation). Crucially, we extend the compositional meta-learning paradigm from linguistic tasks to non-linguistic symbolic spatial reasoning for the first time. Our approach employs a Transformer-based encoder-decoder architecture trained via synthetic data and a composition-aware objective. Experiments demonstrate that our method significantly outperforms strong baselines—including o3-mini, GPT-4o, and Gemini 2.0 Flash—on SYGAR, achieving robust systematic generalization to novel transformation combinations. This validates the broad applicability and effectiveness of compositional meta-learning for abstract spatial reasoning.
📝 Abstract
Systematic generalization refers to the capacity to understand and generate novel combinations from known components. Despite recent progress by large language models (LLMs) across various domains, these models often fail to extend their knowledge to novel compositional scenarios, revealing notable limitations in systematic generalization. There has been an ongoing debate about whether neural networks possess the capacity for systematic generalization, with recent studies suggesting that meta-learning approaches designed for compositionality can significantly enhance this ability. However, these insights have largely been confined to linguistic problems, leaving their applicability to other tasks an open question. In this study, we extend the approach of meta-learning for compositionality to the domain of abstract spatial reasoning. To this end, we introduce $ extit{SYGAR}$-a dataset designed to evaluate the capacity of models to systematically generalize from known geometric transformations (e.g., translation, rotation) of two-dimensional objects to novel combinations of these transformations (e.g., translation+rotation). Our results show that a transformer-based encoder-decoder model, trained via meta-learning for compositionality, can systematically generalize to previously unseen transformation compositions, significantly outperforming state-of-the-art LLMs, including o3-mini, GPT-4o, and Gemini 2.0 Flash, which fail to exhibit similar systematic behavior. Our findings highlight the effectiveness of meta-learning in promoting systematicity beyond linguistic tasks, suggesting a promising direction toward more robust and generalizable models.