LLM360 K2: Scaling Up 360-Open-Source Large Language Models

πŸ“… 2025-01-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the opacity and irreproducibility of large language model (LLM) training due to prohibitive costs, this work open-sources K2 DIAMONDβ€”a 65B-parameter LLMβ€”and achieves, for the first time, full transparency across the entire training lifecycle, including data, code, logs, and hardware configurations. We propose a β€œ360-degree full-stack open-source paradigm,” introducing a low-FLOP/low-token training pathway and establishing critical best practices such as loss stability. Key technical innovations include mixed-precision training, dynamic sequence-length scheduling, fine-grained logging, a custom distributed optimizer, and an end-to-end data deduplication pipeline. Experiments show that K2 DIAMOND outperforms LLaMA-65B and matches LLaMA2-70B in performance, while reducing training FLOPs by 18% and token consumption by 22%. Complementing this, we launch the TXT360 initiative to advance open, reproducible AI research.

Technology Category

Application Category

πŸ“ Abstract
We detail the training of the LLM360 K2-65B model, scaling up our 360-degree OPEN SOURCE approach to the largest and most powerful models under project LLM360. While open-source LLMs continue to advance, the answer to"How are the largest LLMs trained?"remains unclear within the community. The implementation details for such high-capacity models are often protected due to business considerations associated with their high cost. This lack of transparency prevents LLM researchers from leveraging valuable insights from prior experience, e.g.,"What are the best practices for addressing loss spikes?"The LLM360 K2 project addresses this gap by providing full transparency and access to resources accumulated during the training of LLMs at the largest scale. This report highlights key elements of the K2 project, including our first model, K2 DIAMOND, a 65 billion-parameter LLM that surpasses LLaMA-65B and rivals LLaMA2-70B, while requiring fewer FLOPs and tokens. We detail the implementation steps and present a longitudinal analysis of K2 DIAMOND's capabilities throughout its training process. We also outline ongoing projects such as TXT360, setting the stage for future models in the series. By offering previously unavailable resources, the K2 project also resonates with the 360-degree OPEN SOURCE principles of transparency, reproducibility, and accessibility, which we believe are vital in the era of resource-intensive AI research.
Problem

Research questions and friction points this paper is trying to address.

AI Research
Language Model Training
Transparency Issues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale Language Model Training
Resource-efficient Training Strategies
Transparent AI Research Practices
πŸ”Ž Similar Papers
No similar papers found.