Reforming the Mechanism: Editing Reasoning Patterns in LLMs with Circuit Reshaping

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the unreliability of large language models stemming from deficiencies in reasoning capabilities, a challenge that existing editing methods struggle to resolve without compromising unrelated model competencies. To this end, the paper introduces a novel paradigm termed "reasoning editing," which enables precise intervention on specific reasoning patterns through active rewiring of neural circuits. The approach is grounded in the newly identified "circuit interference principle" and integrates contrastive circuit rewiring, meta-contrastive learning, and a dual-level protection mechanism to effectively decouple target circuits, enhance transferability, and preserve irrelevant functionalities. Experiments on Qwen-2.5-3B demonstrate that REdit significantly outperforms baseline methods on propositional logic and mathematical reasoning tasks, achieving both high local editing precision and strong generalization capability.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) often exhibit flawed reasoning ability that undermines reliability. Existing approaches to improving reasoning typically treat it as a general and monolithic skill, applying broad training which is inefficient and unable to target specific reasoning errors. We introduce Reasoning Editing, a paradigm for selectively modifying specific reasoning patterns in LLMs while preserving other reasoning pathways. This task presents a fundamental trade-off between Generality, the ability of an edit to generalize across different tasks sharing the same reasoning pattern, and Locality, the ability to preserve other reasoning capabilities. Through systematic investigation, we uncover the Circuit-Interference Law: Edit interference between reasoning patterns is proportional to the overlap of their neural circuits. Guided by this principle, we propose REdit, the first framework to actively reshape neural circuits before editing, thereby modulating interference between reasoning patterns and mitigating the trade-off. REdit integrates three components: (i) Contrastive Circuit Reshaping, which directly addresses the generality-locality trade-off by disentangling overlapping circuits; (ii) Meta-Contrastive Learning, which extends transferability to novel reasoning patterns; and (iii) Dual-Level Protection, which preserves preexisting abilities by constraining reshaping update directions and regularizing task-level predictions. Extensive experiments with Qwen-2.5-3B on propositional logic reasoning tasks across three difficulty levels demonstrate that REdit consistently achieves superior generality and locality compared to baselines, with additional validation in mathematics showing broader potential. Our code is available at https://github.com/LzyFischer/REdit.

Problem

Research questions and friction points this paper is trying to address.

reasoning editing

large language models

reasoning patterns

generality-locality trade-off

neural circuits

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reasoning Editing

Circuit Reshaping

Circuit-Interference Law