LoKI: Low-damage Knowledge Implanting of Large Language Models

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

To address catastrophic forgetting—a pervasive issue in parameter-efficient fine-tuning (PEFT) of large language models (LLMs)—this paper proposes a low-damage knowledge implantation method. Our approach is the first to integrate mechanistic interpretability insights into LLM internal knowledge storage directly into PEFT design. Grounded in analysis of Transformer knowledge representation mechanisms, it combines low-rank adaptation with a knowledge-anchoring injection strategy to explicitly preserve general-purpose capabilities while adapting to downstream tasks. Experiments across multiple LLMs and realistic scenarios demonstrate that our method matches the task performance of full fine-tuning and LoRA, while significantly outperforming existing PEFT methods in retaining general capabilities. This achieves a superior trade-off between task-specific adaptation and generalization preservation.

Technology Category

Application Category

📝 Abstract

Fine-tuning adapts pretrained models for specific tasks but poses the risk of catastrophic forgetting (CF), where critical knowledge from pre-training is overwritten. Current Parameter-Efficient Fine-Tuning (PEFT) methods for Large Language Models (LLMs), while efficient, often sacrifice general capabilities. To address the issue of CF in a general-purpose PEFT framework, we propose extbf{Lo}w-damage extbf{K}nowledge extbf{I}mplanting ( extbf{LoKI}), a PEFT technique that is based on a mechanistic understanding of how knowledge is stored in transformer architectures. In two real-world scenarios, LoKI demonstrates task-specific performance that is comparable to or even surpasses that of full fine-tuning and LoRA-based methods across various model types, while significantly better preserving general capabilities. Our work connects mechanistic insights into LLM knowledge storage with practical fine-tuning objectives, achieving state-of-the-art trade-offs between task specialization and the preservation of general capabilities. Our implementation is publicly available as ready-to-use codefootnote{https://github.com/Nexround/LoKI}.

Problem

Research questions and friction points this paper is trying to address.

Preventing catastrophic forgetting in fine-tuned LLMs

Balancing task-specific performance and general capabilities

Mechanistic understanding of knowledge storage in transformers

Innovation

Methods, ideas, or system contributions that make the work stand out.

LoKI minimizes catastrophic forgetting in LLMs

Mechanistic understanding of transformer knowledge storage

Balances task performance and general capability preservation

🔎 Similar Papers

No similar papers found.