The Chicken and Egg Dilemma: Co-optimizing Data and Model Configurations for LLMs

πŸ“… 2026-02-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of jointly optimizing data and model configurations in large language model training, a task rendered difficult by their high coupling. To this end, we propose JoBS, the first method to enable efficient joint optimization by integrating a scaling law–informed performance predictor into Bayesian optimization and leveraging multi-fidelity evaluation to substantially reduce the cost of full-scale training. JoBS not only yields an optimal budget allocation strategy but also consistently outperforms baselines that optimize only data, only model hyperparameters, or existing multi-fidelity Bayesian optimization approaches, achieving superior performance across diverse large language model tasks under identical computational budgets.

Technology Category

Application Category

πŸ“ Abstract
Co-optimizing data and model configurations for training LLMs presents a classic chicken-and-egg dilemma: The best training data configuration (e.g., data mixture) for a downstream task depends on the chosen model configuration (e.g., model architecture), and vice versa. However, jointly optimizing both data and model configurations is often deemed intractable, and existing methods focus on either data or model optimization without considering their interaction. We introduce JoBS, an approach that uses a scaling-law-inspired performance predictor to aid Bayesian optimization (BO) in jointly optimizing LLM training data and model configurations efficiently. JoBS allocates a portion of the optimization budget to learn an LLM performance predictor that predicts how promising a training configuration is from a small number of training steps. The remaining budget is used to perform BO entirely with the predictor, effectively amortizing the cost of running full-training runs. We study JoBS's average regret and devise the optimal budget allocation to minimize regret. JoBS outperforms existing multi-fidelity BO baselines, as well as data and model optimization approaches across diverse LLM tasks under the same optimization budget.
Problem

Research questions and friction points this paper is trying to address.

large language models
data configuration
model configuration
joint optimization
chicken-and-egg dilemma
Innovation

Methods, ideas, or system contributions that make the work stand out.

joint optimization
scaling laws
Bayesian optimization
performance prediction
large language models
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhiliang Chen
Department of Computer Science, National University of Singapore, Singapore
A
Alfred Wei Lun Leong
Department of Computer Science, National University of Singapore, Singapore; Agency for Science, Technology and Research (A*STAR), Singapore
S
Shao Yong Ong
Department of Computer Science, National University of Singapore, Singapore
A
Apivich Hemachandram
Department of Computer Science, National University of Singapore, Singapore
Gregory Kang Ruey Lau
Gregory Kang Ruey Lau
National University of Singapore
data-centric AImultimodal large language modelsmachine learningdeep learningphysics
C
Chuan-Sheng Foo
Agency for Science, Technology and Research (A*STAR), Singapore
Zhengyuan Liu
Zhengyuan Liu
Institute for Infocomm Research (I2R) - A*STAR; IEEE Senior Member.
Natural Language ProcessingArtificial IntelligenceHuman-Centered AI
Nancy F. Chen
Nancy F. Chen
ISCA Fellow, AAIA Fellow, Multimodal Generative AI Group Leader, AI for Education Head at A*STAR
Agentic AILarge Language ModelsConversational AI
Bryan Kian Hsiang Low
Bryan Kian Hsiang Low
Associate Professor (with tenure), Department of Computer Science, National University of Singapore
Bayesian OptimizationGaussian ProcessesFederated LearningData-centric AIData Valuation