Probing the "Psyche'' of Large Reasoning Models: Understanding Through a Human Lens

📅 2025-11-29

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Current large reasoning models (LRMs) lack a cognitively grounded, fine-grained characterization of their atomic reasoning steps, hindering interpretability and evaluation of human-like reasoning behavior. Method: We propose the first cognition-informed, fine-grained taxonomy of reasoning steps—comprising five primary categories and seventeen subcategories—grounded in human cognitive processes. To scale annotation, we introduce CAPO, a collaborative framework integrating expert annotation with LLM-powered automatic annotation, ensuring high efficiency and consistency. We construct a high-quality dataset of 277,534 annotated samples. Contribution/Results: CAPO achieves significantly higher inter-annotator agreement than baseline methods. Empirical analysis reveals that LRMs’ self-verification remains largely superficial, prompting our proposal of multi-step deep reflection mechanisms. This work establishes a scalable theoretical framework and empirical foundation for interpretable reasoning modeling and evaluation.

Technology Category

Application Category

📝 Abstract

Large reasoning models (LRMs) have garnered significant attention from researchers owing to their exceptional capability in addressing complex tasks. Motivated by the observed human-like behaviors in their reasoning processes, this paper introduces a comprehensive taxonomy to characterize atomic reasoning steps and probe the ``psyche'' of LRM intelligence. Specifically, it comprises five groups and seventeen categories derived from human mental processes, thereby grounding the understanding of LRMs in an interdisciplinary perspective. The taxonomy is then applied for an in-depth understanding of current LRMs, resulting in a distinct labeled dataset that comprises 277,534 atomic reasoning steps. Using this resource, we analyze contemporary LRMs and distill several actionable takeaways for improving training and post-training of reasoning models. Notably, our analysis reveals that prevailing post-answer ``double-checks'' (self-monitoring evaluations) are largely superficial and rarely yield substantive revisions. Thus, incentivizing comprehensive multi-step reflection, rather than simple self-monitoring, may offer a more effective path forward. To complement the taxonomy, an automatic annotation framework, named CAPO, is proposed to leverage large language models (LLMs) for generating the taxonomy-based annotations. Experimental results demonstrate that CAPO achieves higher consistency with human experts compared to baselines, facilitating a scalable and comprehensive analysis of LRMs from a human cognitive perspective. Together, the taxonomy, CAPO, and the derived insights provide a principled, scalable path toward understanding and advancing LRM reasoning.

Problem

Research questions and friction points this paper is trying to address.

Develop a taxonomy to characterize atomic reasoning steps in LRMs

Analyze LRMs using a labeled dataset of 277,534 reasoning steps

Propose an automatic annotation framework for scalable LRM analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Taxonomy for atomic reasoning steps from human cognition

CAPO framework automates annotation using large language models

Incentivizes multi-step reflection over superficial self-monitoring checks

🔎 Similar Papers

Large Language Models Assume People are More Rational than We Really are