CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training

📅 2026-02-19
📈 Citations: 1
Influential: 0
📄 PDF

career value

213K/year
🤖 AI Summary
This work addresses the systematic capability degradation observed in large language models during post-training, which extends well beyond conventional notions of "knowledge forgetting." The authors propose CapTrack, a framework that redefines forgetting as systematic behavioral drift and introduces the first capability-centric, multidimensional evaluation system. Combining behavioral taxonomies with capability-specific metrics, they conduct large-scale experiments across models up to 80B parameters, spanning multiple algorithms, domains, and model families. Their analysis reveals that forgetting substantially impairs robustness and default behaviors; instruction tuning induces the strongest drift, whereas preference optimization is comparatively conservative and partially reversible. Notably, different model families exhibit markedly distinct forgetting patterns, indicating that no universal mitigation strategy currently exists.
📝 Abstract
Large language model (LLM) post-training enhances latent skills, unlocks value alignment, improves performance, and enables domain adaptation. Unfortunately, post-training is known to induce forgetting, especially in the ubiquitous use-case of leveraging third-party pre-trained models, which is typically understood as a loss of parametric or factual knowledge. We argue that this accuracy-centric view is insufficient for modern foundation models and instead define forgetting as systematic model drift that degrades behavior and user experience. In this context, we introduce \textbf{CapTrack}, a capability-centric framework for analyzing forgetting in LLMs that combines a behavioral taxonomy with an evaluation suite built on established benchmarks and targeted adaptations. Using CapTrack, we conduct a large-scale empirical study across post-training algorithms, domains, and model families, including models up to 80B parameters. We find that forgetting extends beyond parametric knowledge, with pronounced drift in robustness and default behaviors. Instruction fine-tuning induces the strongest relative drift, while preference optimization is more conservative and can partially recover lost capabilities. Differences across model families persist, and no universal mitigation emerges.
Problem

Research questions and friction points this paper is trying to address.

forgetting
large language models
post-training
model drift
capability degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

forgetting
capability-centric evaluation
model drift
post-training
large language models