ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Large language models (LLMs) enhanced with chain-of-thought (CoT) reasoning frequently generate overly concise reasoning traces, degrading mathematical problem-solving performance. Method: We identify a linearly separable direction in the hidden representation space that encodes reasoning length and establish, for the first time, its causal link to the “overly brief thinking” bias. Based on this, we propose ThinkEdit—a parameter-efficient intervention that identifies critical attention heads via directional analysis and modifies only their output projection weights (<0.1% of total parameters), requiring no training or fine-tuning to selectively suppress short-reasoning tendencies. Contribution/Results: ThinkEdit offers strong interpretability and fine-grained controllability. On mathematical reasoning benchmarks, it improves overall accuracy by 2.43% and accuracy on short-reasoning samples by 5.44%, significantly mitigating performance degradation caused by insufficient reasoning.

Technology Category

Application Category

📝 Abstract

Recent studies have shown that Large Language Models (LLMs) augmented with chain-of-thought (CoT) reasoning demonstrate impressive problem-solving abilities. However, in this work, we identify a recurring issue where these models occasionally generate overly short reasoning, leading to degraded performance on even simple mathematical problems. Specifically, we investigate how reasoning length is embedded in the hidden representations of reasoning models and its impact on accuracy. Our analysis reveals that reasoning length is governed by a linear direction in the representation space, allowing us to induce overly short reasoning by steering the model along this direction. Building on this insight, we introduce ThinkEdit, a simple yet effective weight-editing approach to mitigate the issue of overly short reasoning. We first identify a small subset of attention heads (approximately 2%) that predominantly drive short reasoning behavior. We then edit the output projection weights of these heads to suppress the short reasoning direction. With changes to only 0.1% of the model's parameters, ThinkEdit effectively reduces overly short reasoning and yields notable accuracy gains for short reasoning outputs (+5.44%), along with an overall improvement across multiple math benchmarks (+2.43%). Our findings provide new mechanistic insights into how reasoning length is controlled within LLMs and highlight the potential of fine-grained model interventions to improve reasoning quality. Our code is available at https://github.com/Trustworthy-ML-Lab/ThinkEdit

Problem

Research questions and friction points this paper is trying to address.

Mitigate overly short reasoning in LLMs

Identify attention heads driving short reasoning

Edit model weights to improve reasoning accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identify attention heads driving short reasoning

Edit output weights to suppress short reasoning

Improve accuracy with minimal parameter changes

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting