From Reasoning to Generalization: Knowledge-Augmented LLMs for ARC Benchmark

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Large language models (LLMs) lack systematic evaluation of abstract reasoning and generalization capabilities, particularly on the Abstraction and Reasoning Corpus (ARC) benchmark. Method: We propose Knowledge-Augmented Abstract Reasoning (KAAR), a novel framework that models ARC tasks as program synthesis problems. KAAR introduces a three-level dependency-based ontological prior encoding structure for progressive knowledge injection and employs a staged reasoning mechanism: after each level of enhancement, it invokes Repetitive Sampling-based Planning for Code Generation (RSPC) to suppress spurious prior interference. The approach integrates ontological knowledge representation with multi-stage prompt engineering. Contribution/Results: KAAR achieves an average absolute accuracy improvement of 5.0% on ARC, with up to 64.52% relative gain. It demonstrates strong cross-model generalization across diverse LLMs and significantly outperforms existing baselines.

Technology Category

Application Category

📝 Abstract

Recent reasoning-oriented LLMs have demonstrated strong performance on challenging tasks such as mathematics and science examinations. However, core cognitive faculties of human intelligence, such as abstract reasoning and generalization, remain underexplored. To address this, we evaluate recent reasoning-oriented LLMs on the Abstraction and Reasoning Corpus (ARC) benchmark, which explicitly demands both faculties. We formulate ARC as a program synthesis task and propose nine candidate solvers. Experimental results show that repeated-sampling planning-aided code generation (RSPC) achieves the highest test accuracy and demonstrates consistent generalization across most LLMs. To further improve performance, we introduce an ARC solver, Knowledge Augmentation for Abstract Reasoning (KAAR), which encodes core knowledge priors within an ontology that classifies priors into three hierarchical levels based on their dependencies. KAAR progressively expands LLM reasoning capacity by gradually augmenting priors at each level, and invokes RSPC to generate candidate solutions after each augmentation stage. This stage-wise reasoning reduces interference from irrelevant priors and improves LLM performance. Empirical results show that KAAR maintains strong generalization and consistently outperforms non-augmented RSPC across all evaluated LLMs, achieving around 5% absolute gains and up to 64.52% relative improvement. Despite these achievements, ARC remains a challenging benchmark for reasoning-oriented LLMs, highlighting future avenues of progress in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs on abstract reasoning and generalization using ARC benchmark

Proposing KAAR to enhance LLM reasoning via hierarchical knowledge augmentation

Improving ARC task performance with stage-wise prior-augmented reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Repeated-sampling planning-aided code generation (RSPC)

Knowledge Augmentation for Abstract Reasoning (KAAR)

Hierarchical ontology for knowledge priors

🔎 Similar Papers

Large Language Model Enhanced Knowledge Representation Learning: A Survey