Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

To address catastrophic forgetting and cross-task knowledge interference in continual learning of large language models (LLMs), this paper proposes the Analytic Subspace Routing (ASR) framework. ASR achieves task isolation by decoupling deep feature representations into orthogonal subspaces and introduces a replay-free, analytic multi-task router based on recursive least squares (RLS), enabling the first joint optimization of subspace separation and theoretically guaranteed non-forgetting routing. Integrated with low-rank adaptation (LoRA), ASR significantly reduces parameter overhead without requiring historical data replay. On multiple continual learning benchmarks, ASR achieves near-perfect retention (~100%) of prior-task knowledge while substantially improving performance on new tasks. Moreover, it reduces training computational cost by 42% compared to state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) possess encompassing capabilities that can process diverse language-related tasks. However, finetuning on LLMs will diminish this general skills and continual finetuning will further cause severe degradation on accumulated knowledge. Recently, Continual Learning (CL) in Large Language Models (LLMs) arises which aims to continually adapt the LLMs to new tasks while maintaining previously learned knowledge and inheriting general skills. Existing techniques either leverage previous data to replay, leading to extra computational costs, or utilize a single parameter-efficient module to learn the downstream task, constraining new knowledge absorption with interference between different tasks. Toward these issues, this paper proposes Analytic Subspace Routing(ASR) to address these challenges. For each task, we isolate the learning within a subspace of deep layers' features via low-rank adaptation, eliminating knowledge interference between different tasks. Additionally, we propose an analytic routing mechanism to properly utilize knowledge learned in different subspaces. Our approach employs Recursive Least Squares to train a multi-task router model, allowing the router to dynamically adapt to incoming data without requiring access to historical data. Also, the router effectively assigns the current task to an appropriate subspace and has a non-forgetting property of previously learned tasks with a solid theoretical guarantee. Experimental results demonstrate that our method achieves near-perfect retention of prior knowledge while seamlessly integrating new information, effectively overcoming the core limitations of existing methods. Our code will be released after acceptance.

Problem

Research questions and friction points this paper is trying to address.

Addresses continual learning in large language models without forgetting prior knowledge.

Eliminates task interference by isolating learning within subspaces of deep layers.

Uses Recursive Least Squares for dynamic adaptation to new tasks without historical data.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analytic Subspace Routing isolates task learning in subspaces.

Recursive Least Squares trains a dynamic multi-task router.

Low-rank adaptation prevents knowledge interference between tasks.

🔎 Similar Papers

Large Vocabulary Size Improves Large Language Models