How LLMs Learn to Reason: A Complex Network Perspective

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies anomalous phenomena in Reinforcement Learning from Verifiable Rewards (RLVR) with large language models: two-phase learning, V-shaped response-length evolution, and severe catastrophic forgetting. To explain these, we propose the “reasoning-as-semantic-network-self-organization” hypothesis: RLVR training induces structural phase transitions in semantic complex networks, where sparse topology (mean degree ≈ 2) engenders skill isolation and abrupt capability emergence. We present the first dynamical model of RLVR as a semantic network phase transition process, enabling principled design of a maximum-frustration-point heating mechanism and SFT pre-warming to alleviate competitive bottlenecks. Based on this, we introduce Annealed-RLVR—a unified algorithmic framework integrating RLVR, supervised fine-tuning (SFT), complex network theory, and phase-transition analysis. Evaluated on a 1.5B-parameter model, Annealed-RLVR significantly mitigates forgetting and outperforms standard RLVR on both in-distribution and out-of-distribution reasoning benchmarks.

Technology Category

Application Category

📝 Abstract
Training large language models with Reinforcement Learning from Verifiable Rewards (RLVR) exhibits a set of distinctive and puzzling behaviors that remain poorly understood, including a two-stage learning curve, V-shaped response-length trajectories, and a pronounced vulnerability to catastrophic forgetting. In this work, we propose that these seemingly disparate phenomena can be explained using a single unifying theory: the model's reasoning process maps to the self-organization of a semantic complex network whose topology remains persistently sparse, with the average degree pinned close to two. This topology imposes a fundamental mechanism for forgetting and learning: it first drives the system into a maximally frustrated state where ``skill islands'' form, slow-learning happens, and forgetting is induced; then it enters a sharp growth phase where the new skills are ``bolted on'', driven by phase-transition-like learning at the web's frontier. Equipped with the theory, we propose extit{Annealed-RLVR}, a principled algorithm that introduces an SFT-based ``heating'' step at the point of maximal frustration to resolve the competitive bottleneck and enhance the reasoning capability of the model. Experiments on a 1.5B-parameter model demonstrate that the approach outperforms standard RLVR on both in-distribution and out-of-distribution benchmarks. By recasting RLVR from black-box optimization into a predictable process of structural self-organization, our work provides a new physical intuition for engineering the emergent reasoning capabilities of future AI systems.
Problem

Research questions and friction points this paper is trying to address.

Explaining puzzling behaviors in RLVR training of large language models
Understanding catastrophic forgetting and two-stage learning in reasoning models
Developing theory for semantic network self-organization during reasoning learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes semantic complex network theory for reasoning
Introduces Annealed-RLVR algorithm with heating step
Uses SFT-based heating to resolve competitive bottlenecks
🔎 Similar Papers
No similar papers found.
S
Sihan Hu
Department of Modern Physics, University of Science and Technology of China
Xiansheng Cai
Xiansheng Cai
Institute of Theoretical Physics, CAS
Monte Carloeffective field theorysuperconductivitymachine learning
Y
Yuan Huang
DP Technology
Zhiyuan Yao
Zhiyuan Yao
Ph.D. in Financial Engineering, Stevens Institute of Technology
Reinforcement LearningMachine LearningML/RL in Financial Trading
Linfeng Zhang
Linfeng Zhang
DP Technology; AI for Science Institute
AI for Sciencemulti-scale modelingmolecular simulationdrug/materials design
P
Pan Zhang
Institute of Theoretical Physics, Chinese Academy of Sciences, School of Fundamental Physics and Mathematical Sciences, Hangzhou Institute for Advanced Study
Youjin Deng
Youjin Deng
University of Science and Technology of China
Computational Statistical Physics and Condensed-Matter Physics
K
Kun Chen
Institute of Theoretical Physics, Chinese Academy of Sciences