SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In RAG scenarios, supervised fine-tuning often induces catastrophic forgetting, degrading general-purpose capabilities. We identify that RAG fine-tuning significantly shifts the model’s semantic distribution—and demonstrate that this shift strongly correlates with forgetting. To address this, we propose SelfAug: a self-distribution alignment method that requires no external general-purpose instruction data. SelfAug explicitly preserves the original semantic structure during fine-tuning by enforcing distributional consistency of input-sequence logits via a lightweight, plug-and-play self-alignment loss. Its core contribution is the first mechanistic characterization of distribution shift in RAG fine-tuning, coupled with a novel, efficient loss design for implicit semantic regularization. Experiments across multiple RAG downstream tasks show that SelfAug substantially outperforms existing anti-forgetting methods—simultaneously improving task-specific performance and mitigating forgetting, thereby enhancing model generalization.

Technology Category

Application Category

📝 Abstract
Recent advancements in large language models (LLMs) have revolutionized natural language processing through their remarkable capabilities in understanding and executing diverse tasks. While supervised fine-tuning, particularly in Retrieval-Augmented Generation (RAG) scenarios, effectively enhances task-specific performance, it often leads to catastrophic forgetting, where models lose their previously acquired knowledge and general capabilities. Existing solutions either require access to general instruction data or face limitations in preserving the model's original distribution. To overcome these limitations, we propose SelfAug, a self-distribution alignment method that aligns input sequence logits to preserve the model's semantic distribution, thereby mitigating catastrophic forgetting and improving downstream performance. Extensive experiments demonstrate that SelfAug achieves a superior balance between downstream learning and general capability retention. Our comprehensive empirical analysis reveals a direct correlation between distribution shifts and the severity of catastrophic forgetting in RAG scenarios, highlighting how the absence of RAG capabilities in general instruction tuning leads to significant distribution shifts during fine-tuning. Our findings not only advance the understanding of catastrophic forgetting in RAG contexts but also provide a practical solution applicable across diverse fine-tuning scenarios. Our code is publicly available at https://github.com/USTC-StarTeam/SelfAug.
Problem

Research questions and friction points this paper is trying to address.

Mitigates catastrophic forgetting in retrieval-augmented generation models
Preserves model's semantic distribution during fine-tuning
Addresses distribution shifts in RAG fine-tuning scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-distribution alignment method for RAG
Aligns input sequence logits distribution
Mitigates catastrophic forgetting in fine-tuning
Yuqing Huang
Yuqing Huang
Harbin Institute of Technology, Shenzhen
Computer Vision
R
Rongyang Zhang
University of Science and Technology of China
Q
Qimeng Wang
Xiaohongshu Inc.
Chengqiang Lu
Chengqiang Lu
USTC
Y
Yan Gao
Xiaohongshu Inc.
Y
Yi Wu
Xiaohongshu Inc.
Yao Hu
Yao Hu
浙江大学
Machine Learning
X
Xuyang Zhi
University of Science and Technology of China
G
Guiquan Liu
University of Science and Technology of China
X
Xin Li
University of Science and Technology of China
H
Hao Wang
University of Science and Technology of China
Enhong Chen
Enhong Chen
University of Science and Technology of China
data miningrecommender systemmachine learning