🤖 AI Summary
To address the prohibitively high computational cost and energy consumption of training billion-parameter large language models (LLMs), this paper proposes a neurogenesis-inspired progressive growth training paradigm. Our method enables controlled model expansion—from 10M to 101B parameters—via staged parameter scaling, dynamic resource allocation, LoRA-assisted fine-tuning, and mixed-precision training. We present the first reproducible training of an open-source 101B-parameter LLM (FLM-101B) within a $100K budget, achieving 80% of baseline average performance using only 10% of the baseline FLOPs. Empirical evaluation on mainstream NLP benchmarks confirms efficacy. The approach reduces total training FLOPs by 90%, yielding substantial reductions in carbon footprint and economic cost. This work establishes a viable, sustainable pathway for training frontier-scale LLMs without proportional increases in resource expenditure.
📝 Abstract
Large language models (LLMs) are considered important approaches towards foundational machine intelligence, achieving remarkable success in Natural Language Processing and multimodal tasks, among others. However, the carbon footprints and financial costs originating from heavy pre-training computation is a non-negligible issue. Progressive training methods, inspired by the neurogenesis process that grows neural structures, have shown potential to accelerate LLM pre-training. However, the algorithms, implementation, and practices for progressively training LLMs beyond 100B parameters remain underexplored. In this paper, we show that our model, namely FLM-101B, trained with our growth strategy under a budget of $100K, reaches 80% of the baselines' performances with only 10% of their floating-point operations. We believe that further studies on progressive training will benefit the community by cutting down the costs and promoting green AI. The checkpoint of FLM-101B is released at https://huggingface.co/CofeAI/FLM-101B.