Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

This work addresses the challenge of efficiently pretraining extremely large language models in a permissionless, decentralized global environment. The authors propose a trust-minimized collaborative framework built on blockchain, integrating a communication-efficient SparseLoCo optimizer and a dynamic node participation mechanism. This system enables the first globally coordinated pretraining of a 72B-parameter model without requiring a whitelist of trusted participants. It supports dynamic node join and exit while training on approximately 1.1 trillion tokens, achieving model performance comparable to—or even exceeding—that of centralized approaches with similar or higher computational budgets. The approach thus overcomes critical scalability and performance bottlenecks that have previously hindered large-scale decentralized training.

Technology Category

Application Category

📝 Abstract

Recently, there has been increased interest in globally distributed training, which has the promise to both reduce training costs and democratize participation in building large-scale foundation models. However, existing models trained in a globally distributed manner are relatively small in scale and have only been trained with whitelisted participants. Therefore, they do not yet realize the full promise of democratized participation. In this report, we describe Covenant-72B, an LLM produced by the largest collaborative globally distributed pre-training run (in terms of both compute and model scale), which simultaneously allowed open, permissionless participation supported by a live blockchain protocol. We utilized a state-of-the-art communication-efficient optimizer, SparseLoCo, supporting dynamic participation with peers joining and leaving freely. Our model, pre-trained on approximately 1.1T tokens, performs competitively with fully centralized models pre-trained on similar or higher compute budgets, demonstrating that fully democratized, non-whitelisted participation is not only feasible, but can be achieved at unprecedented scale for a globally distributed pre-training run.

Problem

Research questions and friction points this paper is trying to address.

globally distributed training

democratized participation

trustless peers

large language model

permissionless

Innovation

Methods, ideas, or system contributions that make the work stand out.

globally distributed training

permissionless participation

blockchain protocol