🤖 AI Summary
This study addresses the susceptibility of low-rank parameter-efficient fine-tuning (PEFT) methods to catastrophic forgetting in continual learning, a phenomenon whose underlying mechanisms remain poorly understood. Through empirical analysis of the geometric structure and parametrization of update subspaces in representative low-rank and tensor decomposition approaches—including LoRA, LoRETTA, and WeGeFT—the work identifies subspace design as a critical factor governing forgetting behavior. The findings reveal that methods employing tensor decomposition (e.g., LoRETTA) or structurally aligned parametrization (e.g., WeGeFT) substantially mitigate catastrophic forgetting even under extremely limited parameter budgets, outperforming conventional shared-subspace strategies. These insights provide both theoretical grounding and practical guidance for designing efficient adaptation mechanisms in continual learning scenarios.
📝 Abstract
Parameter-efficient fine-tuning (PEFT) based on low-rank decomposition, such as LoRA, has become a standard for adapting large pretrained models. However, its behavior in sequential learning -- specifically regarding catastrophic forgetting -- remains insufficiently understood. In this work, we present an empirical study showing that forgetting is strongly influenced by the geometry and parameterization of the update subspace. While methods that restrict updates to small, shared matrix subspaces often suffer from task interference, tensor-based decompositions (e.g., LoRETTA) mitigate forgetting by capturing richer structural information within ultra-compact budgets, and structurally aligned parameterizations (e.g., WeGeFT) preserve pretrained representations. Our findings highlight update subspace design as a key factor in continual learning and offer practical guidance for selecting efficient adaptation strategies in sequential settings.