On the Plasticity and Stability for Post-Training Large Language Models

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the tension between inference-time plasticity and generalization stability in GRPO post-training by proposing a Bayesian probabilistic conflict resolution framework. It introduces a novel approach that models gradients as random variables and employs an uncertainty-aware “soft projection” mechanism to dynamically reconcile geometric conflicts between plasticity- and stability-oriented gradients. This formulation overcomes the limitations of conventional deterministic projection methods, which neglect the inherent stochasticity of gradients. By integrating Bayesian inference, probabilistic gradient modeling, and signal-to-noise ratio optimization, the method substantially smooths the training trajectory. Empirical results demonstrate consistent improvements over existing baselines across diverse reasoning tasks, effectively balancing model adaptability with stability.

Technology Category

Application Category

📝 Abstract

Training stability remains a critical bottleneck for Group Relative Policy Optimization (GRPO), often manifesting as a trade-off between reasoning plasticity and general capability retention. We identify a root cause as the geometric conflict between plasticity and stability gradients, which leads to destructive interference. Crucially, we argue that deterministic projection methods are suboptimal for GRPO as they overlook the intrinsic stochasticity of group-based gradient estimates. To address this, we propose Probabilistic Conflict Resolution (PCR), a Bayesian framework that models gradients as random variables. PCR dynamically arbitrates conflicts via an uncertainty-aware ``soft projection''mechanism, optimizing the signal-to-noise ratio. Extensive experiments demonstrate that PCR significantly smooths the training trajectory and achieves superior performance in various reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

training stability

reasoning plasticity

capability retention

gradient conflict

post-training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic Conflict Resolution

Bayesian gradient modeling

soft projection