LifeAlign: Lifelong Alignment for Large Language Models with Memory-Augmented Focalized Preference Optimization

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address catastrophic forgetting in large language models (LLMs) during continual alignment with human preferences, this paper proposes Memory-Augmented Focused Preference Optimization (MFPO). MFPO integrates short- to long-term memory mechanisms—specifically, denoising of short-term preference representations, intrinsic dimensionality reduction, and memory consolidation—to enable efficient storage and retrieval of historical preference knowledge. During optimization, MFPO dynamically focuses on task-critical preference features while constraining gradient updates to preserve previously acquired alignment capabilities. Evaluated on a multi-domain sequential alignment benchmark, MFPO substantially outperforms existing continual learning and alignment methods: it maintains high alignment quality while reducing average performance degradation on historical tasks by up to 42%. To our knowledge, MFPO is the first approach to jointly enhance both stability (i.e., retention of prior knowledge) and adaptability (i.e., acquisition of new preferences) across task sequences.

Technology Category

Application Category

📝 Abstract
Alignment plays a crucial role in Large Language Models (LLMs) in aligning with human preferences on a specific task/domain. Traditional alignment methods suffer from catastrophic forgetting, where models lose previously acquired knowledge when adapting to new preferences or domains. We introduce LifeAlign, a novel framework for lifelong alignment that enables LLMs to maintain consistent human preference alignment across sequential learning tasks without forgetting previously learned knowledge. Our approach consists of two key innovations. First, we propose a focalized preference optimization strategy that aligns LLMs with new preferences while preventing the erosion of knowledge acquired from previous tasks. Second, we develop a short-to-long memory consolidation mechanism that merges denoised short-term preference representations into stable long-term memory using intrinsic dimensionality reduction, enabling efficient storage and retrieval of alignment patterns across diverse domains. We evaluate LifeAlign across multiple sequential alignment tasks spanning different domains and preference types. Experimental results demonstrate that our method achieves superior performance in maintaining both preference alignment quality and knowledge retention compared to existing lifelong learning approaches. The codes and datasets will be released on GitHub.
Problem

Research questions and friction points this paper is trying to address.

Preventing catastrophic forgetting in LLMs during sequential alignment tasks
Maintaining consistent human preference alignment across diverse domains
Enabling efficient storage and retrieval of alignment patterns lifelong
Innovation

Methods, ideas, or system contributions that make the work stand out.

Focalized preference optimization prevents knowledge erosion
Memory consolidation merges short-term to long-term representations
Intrinsic dimensionality reduction enables efficient storage retrieval
🔎 Similar Papers
No similar papers found.
Junsong Li
Junsong Li
East China Normal University
NLPLLMNLI
J
Jie Zhou
School of Computer Science and Technology, East China Normal University, Shanghai
Bihao Zhan
Bihao Zhan
East China Normal University
CLLLMRAGKG
Y
Yutao Yang
School of Computer Science and Technology, East China Normal University, Shanghai
Qianjun Pan
Qianjun Pan
East China Normal University
LLM
S
Shilian Chen
School of Computer Science and Technology, East China Normal University, Shanghai
Tianyu Huai
Tianyu Huai
East China Normal University
Continual Learning
X
Xin Li
Shanghai AI Laboratory
Q
Qin Chen
School of Computer Science and Technology, East China Normal University, Shanghai
L
Liang He
School of Computer Science and Technology, East China Normal University, Shanghai