Probing the "Psyche'' of Large Reasoning Models: Understanding Through a Human Lens

📅 2025-11-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large reasoning models (LRMs) lack a cognitively grounded, fine-grained characterization of their atomic reasoning steps, hindering interpretability and evaluation of human-like reasoning behavior. Method: We propose the first cognition-informed, fine-grained taxonomy of reasoning steps—comprising five primary categories and seventeen subcategories—grounded in human cognitive processes. To scale annotation, we introduce CAPO, a collaborative framework integrating expert annotation with LLM-powered automatic annotation, ensuring high efficiency and consistency. We construct a high-quality dataset of 277,534 annotated samples. Contribution/Results: CAPO achieves significantly higher inter-annotator agreement than baseline methods. Empirical analysis reveals that LRMs’ self-verification remains largely superficial, prompting our proposal of multi-step deep reflection mechanisms. This work establishes a scalable theoretical framework and empirical foundation for interpretable reasoning modeling and evaluation.

Technology Category

Application Category

📝 Abstract
Large reasoning models (LRMs) have garnered significant attention from researchers owing to their exceptional capability in addressing complex tasks. Motivated by the observed human-like behaviors in their reasoning processes, this paper introduces a comprehensive taxonomy to characterize atomic reasoning steps and probe the ``psyche'' of LRM intelligence. Specifically, it comprises five groups and seventeen categories derived from human mental processes, thereby grounding the understanding of LRMs in an interdisciplinary perspective. The taxonomy is then applied for an in-depth understanding of current LRMs, resulting in a distinct labeled dataset that comprises 277,534 atomic reasoning steps. Using this resource, we analyze contemporary LRMs and distill several actionable takeaways for improving training and post-training of reasoning models. Notably, our analysis reveals that prevailing post-answer ``double-checks'' (self-monitoring evaluations) are largely superficial and rarely yield substantive revisions. Thus, incentivizing comprehensive multi-step reflection, rather than simple self-monitoring, may offer a more effective path forward. To complement the taxonomy, an automatic annotation framework, named CAPO, is proposed to leverage large language models (LLMs) for generating the taxonomy-based annotations. Experimental results demonstrate that CAPO achieves higher consistency with human experts compared to baselines, facilitating a scalable and comprehensive analysis of LRMs from a human cognitive perspective. Together, the taxonomy, CAPO, and the derived insights provide a principled, scalable path toward understanding and advancing LRM reasoning.
Problem

Research questions and friction points this paper is trying to address.

Develop a taxonomy to characterize atomic reasoning steps in LRMs
Analyze LRMs using a labeled dataset of 277,534 reasoning steps
Propose an automatic annotation framework for scalable LRM analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Taxonomy for atomic reasoning steps from human cognition
CAPO framework automates annotation using large language models
Incentivizes multi-step reflection over superficial self-monitoring checks
🔎 Similar Papers
No similar papers found.
Yuxiang Chen
Yuxiang Chen
Associate Professor of Building Engineering, University of Alberta
High-performance buildingsEnergy efficiencyRenewable energy systems
Z
Zuohan Wu
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Z
Ziwei Wang
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
X
Xiangning Yu
Tianjin University, Tianjin, China
X
Xujia Li
The Hong Kong University of Science and Technology, Hong Kong SAR, China
Linyi Yang
Linyi Yang
Southern University of Science and Technology
Natural Language ProcessingMachine LearningAI for Research
Mengyue Yang
Mengyue Yang
Lecturer, University of Bristol
CausalityTrustworthiness
J
Jun Wang
University College London, London, United Kingdom
L
Lei Chen
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China