A Study on Thinking Patterns of Large Reasoning Models in Code Generation

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Large Reasoning Models (LRMs) exhibit diverse multi-step reasoning behaviors in code generation, yet the relationship between their reasoning patterns and generated code quality remains poorly understood. Method: We propose the first taxonomy of LRM reasoning behaviors—comprising four stages and fifteen fine-grained action types—based on human-annotated reasoning traces. We conduct cross-model (e.g., Qwen3, DeepSeek-R1-7B, o3) and cross-task empirical analysis to characterize reasoning dynamics. Contribution/Results: We identify systematic differences in reasoning paths: Qwen3 adopts iterative refinement, whereas DeepSeek-R1-7B follows a predominantly linear trajectory. Critical actions—including unit test generation and scaffolding construction—significantly improve functional correctness. Moreover, context-aware prompting effectively steers reasoning toward higher-quality paths. Our findings provide both theoretical insights into LRM reasoning mechanisms and practical guidance for prompt engineering and reliability enhancement in code generation.

Technology Category

Application Category

📝 Abstract

Currently, many large language models (LLMs) are utilized for software engineering tasks such as code generation. The emergence of more advanced models known as large reasoning models (LRMs), such as OpenAI's o3, DeepSeek R1, and Qwen3. They have demonstrated the capability of performing multi-step reasoning. Despite the advancement in LRMs, little attention has been paid to systematically analyzing the reasoning patterns these models exhibit and how such patterns influence the generated code. This paper presents a comprehensive study aimed at investigating and uncovering the reasoning behavior of LRMs during code generation. We prompted several state-of-the-art LRMs of varying sizes with code generation tasks and applied open coding to manually annotate the reasoning traces. From this analysis, we derive a taxonomy of LRM reasoning behaviors, encompassing 15 reasoning actions across four phases. Our empirical study based on the taxonomy reveals a series of findings. First, we identify common reasoning patterns, showing that LRMs generally follow a human-like coding workflow, with more complex tasks eliciting additional actions such as scaffolding, flaw detection, and style checks. Second, we compare reasoning across models, finding that Qwen3 exhibits iterative reasoning while DeepSeek-R1-7B follows a more linear, waterfall-like approach. Third, we analyze the relationship between reasoning and code correctness, showing that actions such as unit test creation and scaffold generation strongly support functional outcomes, with LRMs adapting strategies based on task context. Finally, we evaluate lightweight prompting strategies informed by these findings, demonstrating the potential of context- and reasoning-oriented prompts to improve LRM-generated code. Our results offer insights and practical implications for advancing automatic code generation.

Problem

Research questions and friction points this paper is trying to address.

Analyzing reasoning patterns in large models for code generation

Investigating how reasoning behaviors influence generated code correctness

Developing taxonomy of reasoning actions across different model phases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Manual annotation of reasoning traces

Taxonomy of 15 reasoning actions

Context-oriented prompting strategies

🔎 Similar Papers

Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates