Uncovering Scaling Laws for Large Language Models via Inverse Problems

📅 2025-09-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of modeling large language model (LLM) scaling laws—traditionally hindered by high-cost empirical trial-and-error. We propose the first inverse-problem framework dedicated to discovering LLM scaling laws, which systematically infers quantitative relationships among model size, computational cost, and downstream task performance by inverting observed performance–resource consumption data from large-scale pretraining. Unlike conventional empirical curve fitting, our approach formulates scaling-law discovery as an interpretable, verifiable mathematical inverse problem, enabling a paradigm shift from trial-and-error-driven design to principle-driven, law-guided development. The resulting modeling paradigm substantially reduces empirical design overhead while preserving predictive accuracy and improving cost-performance efficiency. It provides both theoretical foundations and practical tools for the efficient construction of LLMs targeting specific performance objectives.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are large-scale pretrained models that have achieved remarkable success across diverse domains. These successes have been driven by unprecedented complexity and scale in both data and computations. However, due to the high costs of training such models, brute-force trial-and-error approaches to improve LLMs are not feasible. Inspired by the success of inverse problems in uncovering fundamental scientific laws, this position paper advocates that inverse problems can also efficiently uncover scaling laws that guide the building of LLMs to achieve the desirable performance with significantly better cost-effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Uncovering scaling laws for large language models
Improving LLM performance with better cost-effectiveness
Using inverse problems to guide efficient model building
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using inverse problems to uncover scaling laws
Guiding LLM construction for better cost-effectiveness
Avoiding brute-force trial-and-error in training
🔎 Similar Papers
Arun Verma
Arun Verma
Singapore-MIT Alliance for Research and Technology
Sequential Decision MakingReinforcement LearningLarge Language Models
Z
Zhaoxuan Wu
Singapore-MIT Alliance for Research and Technology
Z
Zijian Zhou
Singapore-MIT Alliance for Research and Technology, Dept. of Computer Science, National University of Singapore
Xiaoqiang Lin
Xiaoqiang Lin
National University of Singapore
Data-centric AI for Large ModelsData Valuation/AttributionPrompt Optimization
Z
Zhiliang Chen
Dept. of Computer Science, National University of Singapore, Agency for Science, Technology and Research
R
Rachael Hwee Ling Sim
Dept. of Computer Science, National University of Singapore
R
Rui Qiao
Singapore-MIT Alliance for Research and Technology, Dept. of Computer Science, National University of Singapore
Jingtan Wang
Jingtan Wang
PhD, National University of Singapore
LLMsData-centric AI
N
Nhung Bui
Dept. of Computer Science, National University of Singapore
X
Xinyuan Niu
Dept. of Computer Science, National University of Singapore, Agency for Science, Technology and Research
Wenyang Hu
Wenyang Hu
National University of Singapore
Machine Learning
Gregory Kang Ruey Lau
Gregory Kang Ruey Lau
National University of Singapore
data-centric AImultimodal large language modelsmachine learningdeep learningphysics
Z
Zi-Yu Khoo
Dept. of Computer Science, National University of Singapore, AI Singapore
Z
Zitong Zhao
Dept. of Computer Science, National University of Singapore
Xinyi Xu
Xinyi Xu
Meta
data centric-machine learningfederated Learningmulti-agent systemscooperative game theory
Apivich Hemachandra
Apivich Hemachandra
National University of Singapore
physics-informed machine learningactive learning
See-Kiong Ng
See-Kiong Ng
School of Computing and Institute of Data Science, National University of Singapore
artificial intelligencenatural language processingdata miningsmart citiesbioinformatics
Bryan Kian Hsiang Low
Bryan Kian Hsiang Low
Associate Professor (with tenure), Department of Computer Science, National University of Singapore
Bayesian OptimizationGaussian ProcessesFederated LearningData-centric AIData Valuation