🤖 AI Summary
This work addresses the high cost of small-scale pilot experiments required to fit scaling laws in large-scale model training. Framing the problem as a budget-aware sequential experimental design task, the authors propose an uncertainty-aware active selection strategy that dynamically chooses the most informative experiments from a heterogeneous-cost pool for extrapolation to the target regime. By integrating sequential experimental design, uncertainty quantification, and active learning, the method achieves fitting accuracy comparable to that of exhaustive experimentation using only approximately 10% of the total training budget across diverse scaling law tasks, substantially outperforming conventional experimental design baselines.
📝 Abstract
Scaling laws are used to plan multi-million-dollar training runs, but fitting those laws can itself cost millions. In modern large-scale workflows, assembling a sufficiently informative set of pilot experiments is already a major budget-allocation problem rather than a routine preprocessing step. We formulate scaling-law fitting as budget-aware sequential experimental design: given a finite pool of runnable experiments with heterogeneous costs, choose which runs to execute so as to maximize extrapolation accuracy in a high-cost target region. We then propose an uncertainty-aware method for sequentially allocating experimental budget toward the runs most useful for target-region extrapolation. Across a diverse benchmark of scaling-law tasks, our method consistently outperforms classical design-based baselines, and often approaches the performance of fitting on the full experimental set while using only about 10% of the total training budget. Our code is available at https://github.com/PlanarG/active-sl.