🤖 AI Summary
Neural machine translation (NMT) faces dual challenges in continual learning: catastrophic forgetting and high computational overhead. To address these, we propose an efficient continual adaptation framework supporting multilingualism, multi-domain, and stylistic transfer. Methodologically, we first introduce a gradient-history-aware regularization strategy that explicitly constrains the evolution path of low-rank update matrices to mitigate forgetting. Second, we design a gate-free, linearly composable LoRA module architecture enabling interactive, user-controllable adaptation without retraining. Built upon parameter-efficient fine-tuning (PEFT), our approach achieves performance on par with full-parameter fine-tuning across multi-task continual learning scenarios, while reducing trainable parameters by over 90%. This significantly enhances model stability and real-time modality switching capability. Our core contributions lie in the synergistic integration of gradient-history-aware regularization and a composable LoRA architecture, establishing a new paradigm for efficient, adaptive NMT.
📝 Abstract
Continual learning in Neural Machine Translation (NMT) faces the dual challenges of catastrophic forgetting and the high computational cost of retraining. This study establishes Low-Rank Adaptation (LoRA) as a parameter-efficient framework to address these challenges in dedicated NMT architectures. We first demonstrate that LoRA-based fine-tuning adapts NMT models to new languages and domains with performance on par with full-parameter techniques, while utilizing only a fraction of the parameter space. Second, we propose an interactive adaptation method using a calibrated linear combination of LoRA modules. This approach functions as a gate-free mixture of experts, enabling real-time, user-controllable adjustments to domain and style without retraining. Finally, to mitigate catastrophic forgetting, we introduce a novel gradient-based regularization strategy specifically designed for low-rank decomposition matrices. Unlike methods that regularize the full parameter set, our approach weights the penalty on the low-rank updates using historical gradient information. Experimental results indicate that this strategy efficiently preserves prior domain knowledge while facilitating the acquisition of new tasks, offering a scalable paradigm for interactive and continual NMT.