🤖 AI Summary
This work addresses the susceptibility of large language models to catastrophic forgetting and model collapse during continual knowledge editing. The authors propose a mechanism-aware, precise editing framework that, for the first time, integrates sparse circuits with interpretable neurons. By leveraging a sparse transcoder to construct knowledge circuits, the method identifies and manipulates specific functional neurons, enabling fine-grained knowledge updates with minimal interference to unrelated model capabilities. Designed to support lifelong learning, the approach maintains strong performance on standard benchmarks such as MMLU and GSM8K even after 3,000 consecutive edits on Gemma2, Qwen3, and Llama3.1, significantly outperforming existing techniques.
📝 Abstract
Large Language Models (LLMs) often suffer from catastrophic forgetting and collapse during sequential knowledge editing. This vulnerability stems from the prevailing dense editing paradigm, which treats models as black boxes and relies on coarse-grained parameter interventions that inevitably disrupt preserved knowledge. To address this, we propose SCAN (a sparse editing framework based on Sparse Circuit Anchored Neuron) which transforms editing into a mechanism-aware manipulation by constructing a knowledge circuit via Sparse Transcoders. Experiments on Gemma2, Qwen3, and Llama3.1 across CounterFact, ZsRE and WikiFactDiff demonstrate that SCAN achieves a superior performance, maintaining model integrity on benchmarks like MMLU and GSM8K even after 3,000 sequential edits, whereas other existing methods deteriorate progressively as editing accumulates, eventually resulting in model collapse.