SpikingBrain Technical Report: Spiking Brain-inspired Large Models

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mainstream Transformer models suffer from quadratic computational complexity and linear memory growth with sequence length, alongside poor training stability on non-NVIDIA hardware. To address these limitations, we propose SpikingBrain—a brain-inspired spiking large language model—integrating adaptive spiking neurons, linear/hybrid-linear attention, and an event-driven sparse activation mechanism (69.15% sparsity), enabling near-constant memory inference and efficient ultra-long-sequence processing. Leveraging a conversion-based training pipeline, a dedicated spiking encoding framework, and system-level optimizations tailored for the MetaX GPU cluster, we successfully train 7B- and 76B-parameter models. On 4M-token sequences, SpikingBrain achieves over 100× speedup in first-token generation latency; the 7B model attains 23.4% FLOPs utilization and matches open-source baseline performance after only ~150B tokens of continued pretraining.

Technology Category

Application Category

📝 Abstract
Mainstream Transformer-based large language models face major efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly, limiting long-context processing. Building large models on non-NVIDIA platforms also poses challenges for stable and efficient training. To address this, we introduce SpikingBrain, a family of brain-inspired models designed for efficient long-context training and inference. SpikingBrain leverages the MetaX GPU cluster and focuses on three aspects: (1) Model Architecture: linear and hybrid-linear attention architectures with adaptive spiking neurons; (2) Algorithmic Optimizations: an efficient, conversion-based training pipeline and a dedicated spike coding framework; (3) System Engineering: customized training frameworks, operator libraries, and parallelism strategies tailored to MetaX hardware. Using these techniques, we develop two models: SpikingBrain-7B, a linear LLM, and SpikingBrain-76B, a hybrid-linear MoE LLM. These models demonstrate the feasibility of large-scale LLM development on non-NVIDIA platforms. SpikingBrain achieves performance comparable to open-source Transformer baselines while using only about 150B tokens for continual pre-training. Our models significantly improve long-sequence training efficiency and deliver inference with (partially) constant memory and event-driven spiking behavior. For example, SpikingBrain-7B attains over 100x speedup in Time to First Token for 4M-token sequences. Training remains stable for weeks on hundreds of MetaX C550 GPUs, with the 7B model reaching a Model FLOPs Utilization of 23.4 percent. The proposed spiking scheme achieves 69.15 percent sparsity, enabling low-power operation. Overall, this work demonstrates the potential of brain-inspired mechanisms to drive the next generation of efficient and scalable large model design.
Problem

Research questions and friction points this paper is trying to address.

Addresses quadratic computation scaling in Transformer training
Solves linear memory growth during long-context inference
Enables efficient large model training on non-NVIDIA platforms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear and hybrid-linear attention with adaptive spiking neurons
Efficient conversion-based training pipeline and spike coding
Customized training frameworks and parallelism for MetaX hardware
Y
Yuqi Pan
Institute of Automation, Chinese Academy of Sciences
Y
Yupeng Feng
Institute of Automation, Chinese Academy of Sciences
J
Jinghao Zhuang
Institute of Automation, Chinese Academy of Sciences
S
Siyu Ding
Institute of Automation, Chinese Academy of Sciences
Z
Zehao Liu
Institute of Automation, Chinese Academy of Sciences
B
Bohan Sun
Institute of Automation, Chinese Academy of Sciences
Yuhong Chou
Yuhong Chou
The Hong Kong Polytechnic University
foundation modeldeep learninglanguage model
H
Han Xu
Institute of Automation, Chinese Academy of Sciences
Xuerui Qiu
Xuerui Qiu
Institue of Automation, Chinese Academy of Sciences
Representation Learning3D Computer VisionModel Compression
A
Anlin Deng
Institute of Automation, Chinese Academy of Sciences
A
Anjie Hu
Institute of Automation, Chinese Academy of Sciences
P
Peng Zhou
LuxiTech
M
Man Yao
Institute of Automation, Chinese Academy of Sciences
Jibin Wu
Jibin Wu
The Hong Kong Polytechnic University
Spiking Neural NetworkNeuromorphic ComputingSpeech ProcessingCognitive Modelling
J
Jian Yang
MetaX Integrated Circuit Co., Ltd.
G
Guoliang Sun
MetaX Integrated Circuit Co., Ltd.
B
Bo Xu
Institute of Automation, Chinese Academy of Sciences
Guoqi Li
Guoqi Li
Professor, Institue of Automation,Chinese Academy of Sciences,Previously Tsinghua University
Brain inspired computingSpiking neural networksBrain inspired large modelsNeuroAI