Scalable Complexity Control Facilitates Reasoning Ability of LLMs

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the impact of model complexity control on the generalization and reasoning capabilities of large language models (LLMs). We address the instability of scaling laws arising from fixed initialization standard deviations in conventional approaches by proposing a novel complexity-control paradigm: replacing fixed initialization variance with a constant initialization rate, jointly optimized with weight decay coefficients. The method is broadly applicable, scalable, and implementation-friendly. We systematically evaluate it across model sizes up to 2.4B parameters and training data scales up to 1T tokens. Results demonstrate significant improvements in reasoning generalization performance and accelerated convergence of scaling laws along both model-size and data-scale dimensions. Crucially, this work is the first to establish the initialization rate—as a core complexity-control variable—as a decisive factor governing LLM scaling behavior.

Technology Category

Application Category

📝 Abstract
The reasoning ability of large language models (LLMs) has been rapidly advancing in recent years, attracting interest in more fundamental approaches that can reliably enhance their generalizability. This work demonstrates that model complexity control, conveniently implementable by adjusting the initialization rate and weight decay coefficient, improves the scaling law of LLMs consistently over varying model sizes and data sizes. This gain is further illustrated by comparing the benchmark performance of 2.4B models pretrained on 1T tokens with different complexity hyperparameters. Instead of fixing the initialization std, we found that a constant initialization rate (the exponent of std) enables the scaling law to descend faster in both model and data sizes. These results indicate that complexity control is a promising direction for the continual advancement of LLMs.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' reasoning ability through scalable complexity control
Improving scaling laws via initialization rate and weight decay adjustments
Optimizing complexity hyperparameters for better model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adjust initialization rate and weight decay
Constant initialization rate improves scaling
Complexity control enhances LLM generalizability
🔎 Similar Papers
No similar papers found.
L
Liangkai Hang
Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University
J
Junjie Yao
Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University
Zhiwei Bai
Zhiwei Bai
Shanghai Jiao Tong University
Machine Learning;Deep Learning
T
Tianyi Chen
Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University
Y
Yang Chen
Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University
R
Rongjie Diao
Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University
H
Hezhou Li
Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University
P
Pengxiao Lin
Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University
Z
Zhiwei Wang
Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University
C
Cheng Xu
Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University
Zhongwang Zhang
Zhongwang Zhang
Shanghai Jiao Tong University
Z
Zhangchen Zhou
Institute of Natural Sciences, School of Mathematical Sciences, Shanghai Jiao Tong University
Zhiyu Li
Zhiyu Li
Tianjin University
Robust controlattitude control
Z
Zehao Lin
Center for LLM, Institute for Advanced Algorithms Research, Shanghai
K
Kai Chen
MemTensor (Shanghai) Technology Co., Ltd.
Feiyu Xiong
Feiyu Xiong
MemTensor (Shanghai) Technology Co., Ltd.
Machine LearningNLPLLM
Yaoyu Zhang
Yaoyu Zhang
Shanghai Jiao Tong University
Deep Learning Theory
E
E. Weinan
Center for Machine Learning Research, School of Mathematical Sciences, Peking University
H
Hongkang Yang
MemTensor (Shanghai) Technology Co., Ltd.
Z
Zhi-Qin John Xu
MOE-LSC, School of Artificial Intelligence, Shanghai Jiao Tong University