$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

📅 2024-10-30
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
The prevailing assumption that academic institutions cannot conduct large language model (LLM) pretraining due to computational constraints lacks empirical validation. Method: We systematically investigate global academic GPU availability, design a GPU-aware pretraining benchmark, and propose an efficiency-optimized framework for resource-constrained settings—incorporating distributed training tuning, hyperparameter consistency control, and multi-objective analysis of compute, time, and cost. Contribution/Results: Within 2,000 GPU-hours, we successfully pretrain models—including Pythia-1B—across diverse hardware (e.g., four consumer-grade GPUs), reducing required GPU-days by 67% versus baseline approaches and achieving full reproduction in just 18 days. We open-source all code, training pipelines, and evaluation protocols, establishing the first reproducible, scalable LLM pretraining benchmark and practical methodology tailored for academia.

Technology Category

Application Category

📝 Abstract
Pre-training is notoriously compute-intensive and academic researchers are notoriously under-resourced. It is, therefore, commonly assumed that academics can't pre-train models. In this paper, we seek to clarify this assumption. We first survey academic researchers to learn about their available compute and then empirically measure the time to replicate models on such resources. We introduce a benchmark to measure the time to pre-train models on given GPUs and also identify ideal settings for maximizing training speed. We run our benchmark on a range of models and academic GPUs, spending 2,000 GPU-hours on our experiments. Our results reveal a brighter picture for academic pre-training: for example, although Pythia-1B was originally trained on 64 GPUs for 3 days, we find it is also possible to replicate this model (with the same hyper-parameters) in 3x fewer GPU-days: i.e. on 4 GPUs in 18 days. We conclude with a cost-benefit analysis to help clarify the trade-offs between price and pre-training time. We believe our benchmark will help academic researchers conduct experiments that require training larger models on more data. We fully release our codebase at: https://github.com/apoorvkh/academic-pretraining.
Problem

Research questions and friction points this paper is trying to address.

Investigating academic feasibility of compute-intensive model pre-training
Benchmarking pre-training efficiency across limited GPU resources
Analyzing trade-offs between computational costs and training time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark measures pre-training time on GPUs
Identifies ideal settings for maximizing training speed
Cost-benefit analysis clarifies price-time trade-offs
🔎 Similar Papers
No similar papers found.