Your thoughts tell who you are: Characterize the reasoning patterns of LRMs

๐Ÿ“… 2025-09-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Current large reasoning models (LRMs) rely heavily on macro-level evaluation metrics (e.g., accuracy, step count) and lack systematic, fine-grained characterization of their intrinsic reasoning patterns. Method: We propose LOT (Language of Thought), the first interpretable, automatically generated natural language reasoning taxonomy. LOT is constructed from reasoning traces across mathematical, scientific, and programming tasks, enabling fine-grained cognitive modeling of 12 open-source LRMs. It integrates generative feature extraction, empirical distribution modeling, and iterative classification to capture model-specific reasoning behaviors. Contribution/Results: LOT achieves 80โ€“100% model attribution accuracy and enables reasoning-style alignment for smaller modelsโ€”boosting Qwen3โ€™s GPQA accuracy by 3.3โ€“5.7%. This work establishes the first open, reasoning-difference-aware classification framework for LRMs, advancing model diagnosis, knowledge distillation, and controllable reasoning.

Technology Category

Application Category

๐Ÿ“ Abstract
Current comparisons of large reasoning models (LRMs) focus on macro-level statistics such as task accuracy or reasoning length. Whether different LRMs reason differently remains an open question. To address this gap, we introduce the LLM-proposed Open Taxonomy (LOT), a classification method that uses a generative language model to compare reasoning traces from two LRMs and articulate their distinctive features in words. LOT then models how these features predict the source LRM of a reasoning trace based on their empirical distributions across LRM outputs. Iterating this process over a dataset of reasoning traces yields a human-readable taxonomy that characterizes how models think. We apply LOT to compare the reasoning of 12 open-source LRMs on tasks in math, science, and coding. LOT identifies systematic differences in their thoughts, achieving 80-100% accuracy in distinguishing reasoning traces from LRMs that differ in scale, base model family, or objective domain. Beyond classification, LOT's natural-language taxonomy provides qualitative explanations of how LRMs think differently. Finally, in a case study, we link the reasoning differences to performance: aligning the reasoning style of smaller Qwen3 models with that of the largest Qwen3 during test time improves their accuracy on GPQA by 3.3-5.7%.
Problem

Research questions and friction points this paper is trying to address.

Characterizing distinct reasoning patterns across large reasoning models
Developing taxonomy to differentiate reasoning traces between LRMs
Linking reasoning style differences to model performance improvements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative language model classifies reasoning traces
Iterative process yields human-readable taxonomy
Natural-language taxonomy explains model reasoning differences
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yida Chen
Harvard University
Yuning Mao
Yuning Mao
Meta Superintelligence Labs
Natural Language ProcessingGenerative AI
X
Xianjun Yang
Meta Superintelligence Labs
S
Suyu Ge
Meta Superintelligence Labs
S
Shengjie Bi
Meta Superintelligence Labs
L
Lijuan Liu
Meta Superintelligence Labs
Saghar Hosseini
Saghar Hosseini
Senior Research Scientist at Meta
Responsible AINatural Language ProcessingDeep Learning
L
Liang Tan
Meta Superintelligence Labs
Yixin Nie
Yixin Nie
Meta, UNC Chapel Hill
Natural Language ProcessingMachine Learning
Shaoliang Nie
Shaoliang Nie
Meta
Language modelExplainabilityVisualization