🤖 AI Summary
Modern foundation models rely on scaling laws to guide training, yet existing studies exhibit inconsistent and irreproducible conclusions when extrapolating optimal architectures and hyperparameters—such as the tokens-to-parameters ratio—from small-scale experiments, due to heterogeneous fitting methodologies, divergent training configurations, and insufficient reporting of experimental details. Method: We systematically review over 50 scaling law papers and find that although 45 adopt power-law fitting, most omit critical experimental details; through controlled-variable empirical analysis, we demonstrate that minor configuration changes induce scaling exponent deviations exceeding 20%, substantially altering architectural recommendations. Contribution/Results: We propose the first standardized checklist for scaling law research and quantitatively characterize the sensitivity of fitting outcomes to multidimensional experimental variables, thereby establishing a methodological foundation to enhance the reliability and reproducibility of scaling studies.
📝 Abstract
Modern foundation models rely heavily on using scaling laws to guide crucial training decisions. Researchers often extrapolate the optimal architecture and hyper parameters settings from smaller training runs by describing the relationship between, loss, or task performance, and scale. All components of this process vary, from the specific equation being fit, to the training setup, to the optimization method. Each of these factors may affect the fitted law, and therefore, the conclusions of a given study. We discuss discrepancies in the conclusions that several prior works reach, on questions such as the optimal token to parameter ratio. We augment this discussion with our own analysis of the critical impact that changes in specific details may effect in a scaling study, and the resulting altered conclusions. Additionally, we survey over 50 papers that study scaling trends: while 45 of these papers quantify these trends using a power law, most under-report crucial details needed to reproduce their findings. To mitigate this, we we propose a checklist for authors to consider while contributing to scaling law research.