Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models

📅 2026-01-11

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work investigates the significant performance disparities observed among large language models under identical reinforcement learning (RL) training, noting that some models struggle to benefit from such optimization. The study introduces, for the first time, “distributional sharpness” as a key structural property governing RL compatibility and quantifies it using the Silhouette Coefficient. Building on this insight, the authors propose a Silhouette-Aware Reweighting strategy that adaptively reweights low-sharpness samples during training to improve learning efficiency. Extensive experiments across six mathematical reasoning benchmarks demonstrate consistent performance gains, with improvements of up to 5.9 points on AIME24, thereby validating both the trainability and broad applicability of distributional sharpness as a guiding principle for RL-based model refinement.

Technology Category

Application Category

📝 Abstract

Language model families exhibit striking disparity in their capacity to benefit from reinforcement learning: under identical training, models like Qwen achieve substantial gains, while others like Llama yield limited improvements. Complementing data-centric approaches, we reveal that this disparity reflects a hidden structural property: \textbf{distributional clarity} in probability space. Through a three-stage analysis-from phenomenon to mechanism to interpretation-we uncover that RL-friendly models exhibit intra-class compactness and inter-class separation in their probability assignments to correct vs. incorrect responses. We quantify this clarity using the \textbf{Silhouette Coefficient} ($S$) and demonstrate that (1) high $S$ correlates strongly with RL performance; (2) low $S$ is associated with severe logic errors and reasoning instability. To confirm this property, we introduce a Silhouette-Aware Reweighting strategy that prioritizes low-$S$ samples during training. Experiments across six mathematical benchmarks show consistent improvements across all model families, with gains up to 5.9 points on AIME24. Our work establishes distributional clarity as a fundamental, trainable property underlying RL-Friendliness.

Problem

Research questions and friction points this paper is trying to address.

RL-Friendliness

distributional clarity

large language models

reinforcement learning

probability distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

distributional clarity

reinforcement learning

Silhouette Coefficient