🤖 AI Summary
This work systematically investigates the reasoning mechanisms of large reasoning models (e.g., OpenAI o1, DeepSeek R1) to clarify their capability origins, effective boundaries, and prevalent misconceptions. We propose the first unified analytical framework integrating behavioral evaluation, architectural inversion, chain-of-thought tracing, and computational trajectory visualization. Empirical analysis reveals that their reasoning proficiency stems primarily from search-augmented inference and strategic computation delay—not implicit logical deduction. We identify, for the first time, the models’ genuine strengths and systematic failure modes in mathematical and symbolic reasoning, thereby refuting the widespread misconception that they possess intrinsic logical reasoning ability. Our findings establish an interpretable theoretical benchmark and methodological foundation for designing and evaluating trustworthy reasoning models. (132 words)
📝 Abstract
We provide a broad unifying perspective on the recent breed of large reasoning models such as OpenAI o1 and DeepSeek R1, including their promise, sources of power, misconceptions, and limitations.