On the Plasticity and Stability for Post-Training Large Language Models

📅 2026-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the tension between inference-time plasticity and generalization stability in GRPO post-training by proposing a Bayesian probabilistic conflict resolution framework. It introduces a novel approach that models gradients as random variables and employs an uncertainty-aware “soft projection” mechanism to dynamically reconcile geometric conflicts between plasticity- and stability-oriented gradients. This formulation overcomes the limitations of conventional deterministic projection methods, which neglect the inherent stochasticity of gradients. By integrating Bayesian inference, probabilistic gradient modeling, and signal-to-noise ratio optimization, the method substantially smooths the training trajectory. Empirical results demonstrate consistent improvements over existing baselines across diverse reasoning tasks, effectively balancing model adaptability with stability.

Technology Category

Application Category

📝 Abstract
Training stability remains a critical bottleneck for Group Relative Policy Optimization (GRPO), often manifesting as a trade-off between reasoning plasticity and general capability retention. We identify a root cause as the geometric conflict between plasticity and stability gradients, which leads to destructive interference. Crucially, we argue that deterministic projection methods are suboptimal for GRPO as they overlook the intrinsic stochasticity of group-based gradient estimates. To address this, we propose Probabilistic Conflict Resolution (PCR), a Bayesian framework that models gradients as random variables. PCR dynamically arbitrates conflicts via an uncertainty-aware ``soft projection''mechanism, optimizing the signal-to-noise ratio. Extensive experiments demonstrate that PCR significantly smooths the training trajectory and achieves superior performance in various reasoning tasks.
Problem

Research questions and friction points this paper is trying to address.

training stability
reasoning plasticity
capability retention
gradient conflict
post-training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic Conflict Resolution
Bayesian gradient modeling
soft projection
training stability
reasoning plasticity
🔎 Similar Papers