🤖 AI Summary
To address catastrophic forgetting of prior knowledge and poor robustness to input perturbations during continual knowledge editing in large language models (LLMs), this paper proposes a lifelong learning-oriented knowledge updating framework. Methodologically, it introduces: (1) a novel continuous data-adaptor mapping mechanism; (2) a knowledge-aware LoRA allocation loss coupled with a capability delay retention strategy, jointly optimizing edit robustness, generalization, and preservation of original model capabilities; and (3) a multi-LoRA hybrid architecture featuring a learnable router, a knowledge alignment loss, and a dynamic parameter freezing/unfreezing mechanism. Extensive experiments on GPT-2 XL and LLaMA2-7B demonstrate that the proposed method outperforms all eight baseline approaches, achieving significantly reduced forgetting rates while maintaining high edit accuracy and strong downstream task performance.
📝 Abstract
Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors. Most model editing methods are solely designed for single-time use and result in a significant forgetting effect in lifelong editing scenarios, where sequential edits are conducted over time. Previous approaches manage sequential edits by freezing original parameters and discretely allocating new parameters for each knowledge update. However, these methods lack robustness to minor input variations due to the discrete mapping between data and parameters. To overcome this challenge, we propose ELDER, a novel approach to create a continuous association between data and adapters. ELDER integrates multiple LoRAs through a router network and is trained to establish a smooth data-adapter association, thereby enhancing the edit robustness and generalization of semantically equivalent inputs. To ensure inputs containing the same knowledge will be processed by the same LoRAs, we design a novel loss to guide the model link LoRA allocations with edit knowledge. Furthermore, we propose a deferral mechanism to retain the original LLM capabilities post-edit. Extensive experiments on GPT-2 XL and LLaMA2-7B demonstrate that ELDER effectively edits models in the lifelong setting, outperforming eight baselines while exhibiting strong scalability and preserving LLMs' general abilities on downstream tasks. Our code is available at https://github.com/JiaangL/ELDER.