FLM-101B: An Open LLM and How to Train It with $100K Budget

📅 2023-09-07
🏛️ arXiv.org
📈 Citations: 21
Influential: 0
📄 PDF
🤖 AI Summary
To address the prohibitively high computational cost and energy consumption of training billion-parameter large language models (LLMs), this paper proposes a neurogenesis-inspired progressive growth training paradigm. Our method enables controlled model expansion—from 10M to 101B parameters—via staged parameter scaling, dynamic resource allocation, LoRA-assisted fine-tuning, and mixed-precision training. We present the first reproducible training of an open-source 101B-parameter LLM (FLM-101B) within a $100K budget, achieving 80% of baseline average performance using only 10% of the baseline FLOPs. Empirical evaluation on mainstream NLP benchmarks confirms efficacy. The approach reduces total training FLOPs by 90%, yielding substantial reductions in carbon footprint and economic cost. This work establishes a viable, sustainable pathway for training frontier-scale LLMs without proportional increases in resource expenditure.
📝 Abstract
Large language models (LLMs) are considered important approaches towards foundational machine intelligence, achieving remarkable success in Natural Language Processing and multimodal tasks, among others. However, the carbon footprints and financial costs originating from heavy pre-training computation is a non-negligible issue. Progressive training methods, inspired by the neurogenesis process that grows neural structures, have shown potential to accelerate LLM pre-training. However, the algorithms, implementation, and practices for progressively training LLMs beyond 100B parameters remain underexplored. In this paper, we show that our model, namely FLM-101B, trained with our growth strategy under a budget of $100K, reaches 80% of the baselines' performances with only 10% of their floating-point operations. We believe that further studies on progressive training will benefit the community by cutting down the costs and promoting green AI. The checkpoint of FLM-101B is released at https://huggingface.co/CofeAI/FLM-101B.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Cost-effective Training
Environmental Impact
Innovation

Methods, ideas, or system contributions that make the work stand out.

FLM-101B
cost-efficient
eco-friendly
🔎 Similar Papers
No similar papers found.
X
Xiang Li
Beijing Academy of Artificial Intelligence, Beijing, China
Yiqun Yao
Yiqun Yao
Unknown affiliation
X
Xin Jiang
Beijing Academy of Artificial Intelligence, Beijing, China
X
Xuezhi Fang
Beijing Academy of Artificial Intelligence, Beijing, China
Xuying Meng
Xuying Meng
Institute of Computing Technology, Chinese Academy of Sciences
S
Siqi Fan
University of Electronic Science and Technology of China, Chengdu, China
Peng Han
Peng Han
Professor, Department of Computer Science, UESTC
drug discoveryspatial temporaldata mining
J
Jing Li
Harbin Institute of Technology, Shenzhen, China
L
Li Du
Beijing Academy of Artificial Intelligence, Beijing, China
Bowen Qin
Bowen Qin
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Z
Zheng Zhang
Beijing Academy of Artificial Intelligence, Beijing, China
A
Aixin Sun
School of Computer Science and Engineering, Nanyang Technological University, Singapore
Y
Yequan Wang
Beijing Academy of Artificial Intelligence, Beijing, China