Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge of catastrophic forgetting in language models during continual learning, a problem exacerbated by the impracticality of existing rehearsal-based methods that rely on storing historical data. The study systematically uncovers, for the first time, the interplay among model capacity, optimization strategies, and forgetting dynamics. It proposes a novel approach that leverages the model’s own generative capabilities to synthesize replay data, thereby mitigating forgetting without requiring external memory. By integrating tailored fine-tuning protocols with adaptive learning rate scheduling, the method achieves rapid adaptation under high learning rates while stably preserving previously acquired knowledge—provided sufficient model capacity is available. This strategy effectively eliminates catastrophic forgetting and transcends the conventional trade-off between learning efficiency and memory retention.

📝 Abstract

Models trained on a new task typically degrade on prior tasks, a phenomenon known as forgetting. Traditionally, mitigating forgetting has required replaying stored exemplars from prior tasks, which is often impractical. By contrast, language models can sample from their own training distribution, and we show that these self-generated samples serve as effective replay data, nearly eliminating forgetting. We find that forgetting nonetheless persists when the model has little remaining capacity: models pretrained close to saturation cannot absorb new information without overwriting prior knowledge. When capacity is not the limiting factor, low learning rates reduce forgetting but require substantially more training steps. Replay breaks this tradeoff, enabling fast, high-learning-rate finetuning without forgetting.

Problem

Research questions and friction points this paper is trying to address.

forgetting

language models

capacity

replay

continual learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-generated replay

catastrophic forgetting

language models