Cognitive Foundations for Reasoning and Their Manifestation in LLMs

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work identifies a fundamental cognitive divergence between large language models (LLMs) and humans in complex problem solving: LLMs lack metacognitive monitoring and hierarchical, nested reasoning capabilities, relying instead on shallow, forward-only inference—especially in unstructured tasks. Method: We develop the first theory-driven taxonomy encompassing 28 cognitive constructs and propose a fine-grained cognitive evaluation framework. Leveraging 170K model reasoning traces and 54 human think-aloud protocols, we integrate behavioral analysis, large-scale log mining, and meta-analysis of 1,598 cognitive science publications to systematically bridge cognitive science and LLM research. Contribution/Results: We demonstrate that prevailing evaluation methods neglect metacognitive mechanisms. Building on this insight, we design test-time structured reasoning guidance strategies, achieving up to 60% performance gain on complex tasks. We publicly release a large-scale, multimodal reasoning trace dataset to support reproducible cognitive assessment of foundation models.

Technology Category

Application Category

📝 Abstract

Large language models solve complex problems yet fail on simpler variants, suggesting they achieve correct outputs through mechanisms fundamentally different from human reasoning. We synthesize cognitive science research into a taxonomy of 28 cognitive elements spanning computational constraints, meta-cognitive controls, knowledge representations, and transformation operations, then analyze their behavioral manifestations in reasoning traces. We propose a fine-grained cognitive evaluation framework and conduct the first large-scale analysis of 170K traces from 17 models across text, vision, and audio modalities, alongside 54 human think-aloud traces, which we make publicly available. Our analysis reveals systematic structural differences: humans employ hierarchical nesting and meta-cognitive monitoring while models rely on shallow forward chaining, with divergence most pronounced on ill-structured problems. Meta-analysis of 1,598 LLM reasoning papers reveals the research community concentrates on easily quantifiable behaviors (sequential organization: 55%, decomposition: 60%) while neglecting meta-cognitive controls (self-awareness: 16%, evaluation: 8%) that correlate with success. Models possess behavioral repertoires associated with success but fail to deploy them spontaneously. Leveraging these patterns, we develop test-time reasoning guidance that automatically scaffold successful structures, improving performance by up to 60% on complex problems. By bridging cognitive science and LLM research, we establish a foundation for developing models that reason through principled cognitive mechanisms rather than brittle spurious reasoning shortcuts or memorization, opening new directions for both improving model capabilities and testing theories of human cognition at scale.

Problem

Research questions and friction points this paper is trying to address.

LLMs fail on simple problems despite solving complex ones

Models rely on shallow reasoning unlike human hierarchical cognition

Current research neglects meta-cognitive controls crucial for success

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed fine-grained cognitive evaluation framework for models

Proposed test-time reasoning guidance to scaffold structures

Bridged cognitive science and LLM research for principled reasoning

🔎 Similar Papers

No similar papers found.