Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper systematically diagnoses fundamental capability bottlenecks of large language models (LLMs) in solving novel, abstract reasoning tasks—using the Abstraction and Reasoning Corpus (ARC) as a benchmark—from the cognitive science perspective of fluid intelligence. Method: Through controlled experiments, multi-model comparisons, ablation studies of prompt engineering, and format generalization tests, the study isolates and quantifies three core limitations: weak compositional skill acquisition, difficulty adapting to abstract input formats, and inherent left-to-right dependency in autoregressive decoding. Contribution/Results: The work provides the first quantitative characterization of how these constraints degrade ARC performance, demonstrating that state-of-the-art LLMs significantly underperform human baselines. To foster reproducible research, the authors open-source all data, prompts, and evaluation code, advancing the development of standardized, cognitively grounded assessment frameworks for fluid intelligence in foundation models.

Technology Category

Application Category

📝 Abstract

While LLMs have exhibited strong performance on various NLP tasks, it is noteworthy that most of these tasks rely on utilizing the vast amount of knowledge encoded in LLMs' parameters, rather than solving new problems without prior knowledge. In cognitive research, the latter ability is referred to as fluid intelligence, which is considered to be critical for assessing human intelligence. Recent research on fluid intelligence assessments has highlighted significant deficiencies in LLMs' abilities. In this paper, we analyze the challenges LLMs face in demonstrating fluid intelligence through controlled experiments, using the most representative ARC task as an example. Our study revealed three major limitations in existing LLMs: limited ability for skill composition, unfamiliarity with abstract input formats, and the intrinsic deficiency of left-to-right decoding. Our data and code can be found in https://wujunjie1998.github.io/araoc-benchmark.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Analyzing LLMs' fluid intelligence deficiencies

Identifying limitations in LLMs' problem-solving abilities

Exploring challenges in LLMs' abstract input processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs fluid intelligence analysis

ARC task controlled experiments

skill composition limitations study

🔎 Similar Papers

No similar papers found.

Authors to Follow