MiniCPM4: Ultra-Efficient LLMs on End Devices

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of efficiently deploying large language models (LLMs) on resource-constrained edge devices, this paper introduces the MiniCPM4 series (0.5B/8B), integrating several novel techniques: (1) InfLLM v2—a trainable sparse attention mechanism; (2) BitCPM, a ternary quantization scheme synergistically combined with INT4 compression; (3) chunk-wise reinforcement learning for sequence-level optimization; (4) UltraClean data cleaning and the high-quality UltraChat v2 fine-tuning dataset; (5) ModelTunnel v2, an automated pretraining architecture search framework; and (6) CPM.cu, a unified inference engine. The models preserve robust long-context modeling capability while substantially improving inference speed and energy efficiency. On mainstream benchmarks, MiniCPM4-8B outperforms open-source LLMs of comparable parameter count; it achieves faster long-sequence processing than Qwen3-8B. The framework has been successfully deployed in real-world edge applications, including trustworthy questionnaire generation and tool-augmented reasoning.

Technology Category

Application Category

📝 Abstract
This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we propose CPM.cu that integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Sufficient evaluation results show that MiniCPM4 outperforms open-source models of similar size across multiple benchmarks, highlighting both its efficiency and effectiveness. Notably, MiniCPM4-8B demonstrates significant speed improvements over Qwen3-8B when processing long sequences. Through further adaptation, MiniCPM4 successfully powers diverse applications, including trustworthy survey generation and tool use with model context protocol, clearly showcasing its broad usability.
Problem

Research questions and friction points this paper is trying to address.

Develop ultra-efficient LLMs for end devices
Optimize model architecture, training data, and algorithms
Achieve high performance with minimal training tokens
Innovation

Methods, ideas, or system contributions that make the work stand out.

InfLLM v2 sparse attention for long-context
UltraClean and UltraChat v2 efficient datasets
CPM.cu integrates sparse attention and quantization
🔎 Similar Papers
No similar papers found.
Chaojun Xiao
Chaojun Xiao
Postdoctoral Researcher, Tsinghua University
Large Language Model
Y
Yuxuan Li
MiniCPM Team
X
Xu Han
MiniCPM Team
Yuzhuo Bai
Yuzhuo Bai
Tsinghua University
Natural Language Processing
J
Jie Cai
MiniCPM Team
Haotian Chen
Haotian Chen
University of California, Los Angeles
Political EconomyNon-market StrategyAmerican Politics
W
Wentong Chen
MiniCPM Team
Xin Cong
Xin Cong
Tsinghua University
Tool LearningAutonomous AgentLarge Language ModelKnowledge Graph
Ganqu Cui
Ganqu Cui
Shanghai AI Lab
LLM AlignmentReinforcement Learning
N
Ning Ding
MiniCPM Team
S
Shengda Fan
MiniCPM Team
Yewei Fang
Yewei Fang
Soochow University
Natural Language ProcessingLarge Language Model
Zixuan Fu
Zixuan Fu
Nanyang Technological University
Image RestorationGenerative ModelsLow-level Vision
W
Wenyu Guan
MiniCPM Team
Y
Yitong Guan
MiniCPM Team
J
Junshao Guo
MiniCPM Team
Y
Yufeng Han
MiniCPM Team
Bingxiang He
Bingxiang He
Second year PhD Candidate, Tsinghua University
Natural Language Processing
Yuxiang Huang
Yuxiang Huang
Tsinghua University
Efficient AINatural Language ProcessingMachine Learning System
C
Cunliang Kong
MiniCPM Team
Q
Qiuzuo Li
MiniCPM Team
S
Siyuan Li
MiniCPM Team
W
Wenhao Li
MiniCPM Team
Yanghao Li
Yanghao Li
Apple
Computer Vision
Yishan Li
Yishan Li
OpenBMB
Natural Language ProcessingLagre Language ModelInformation Retrieval
Z
Zhen Li
MiniCPM Team
D
Dan Liu
MiniCPM Team
B
Biyuan Lin
MiniCPM Team
Yankai Lin
Yankai Lin
Associate Professor (Tenure Track), Gaoling School of AI, Renmin University of China
Natural Language ProcessingLarge Language Models
X
Xiang Long
MiniCPM Team
Q
Quanyu Lu
MiniCPM Team
Y
Ya-Ting Lu
MiniCPM Team
P
Pei Luo
MiniCPM Team
H
Hongya Lyu
MiniCPM Team
Litu Ou
Litu Ou
University of Edinburgh
Natural Language ProcessingMachine LearningInformation Retrieval
Y
Yinxu Pan
MiniCPM Team
Z
Zekai Qu
MiniCPM Team
Q
Qundong Shi
MiniCPM Team
Zijun Song
Zijun Song
School of Advanced Interdisciplinary Sciences, University of the Chinese Academy of Sciences
AgentRL
Jiayuan Su
Jiayuan Su
Zhejiang University
LLMPost-TrainingReasoning
Zhou Su
Zhou Su
Xi'an Jiaotong University
A
Ao Sun
MiniCPM Team
X
Xianghui Sun
MiniCPM Team
P
Peijun Tang
MiniCPM Team
F
Fang-Ming Wang
MiniCPM Team
F
Feng Wang
MiniCPM Team
S
Shuo Wang
MiniCPM Team
Y
Yudong Wang
MiniCPM Team
Yesai Wu
Yesai Wu
Tsinghua University, ModelBest.Inc, Huazhong University of Science and Technology
Autonomous AgentTool LearningLarge Language Model
Z
Zhenyu Xiao
MiniCPM Team
J
Jie Xie
MiniCPM Team
Z
Zihao Xie
MiniCPM Team
Yukun Yan
Yukun Yan
Tsinghua University
Large Language Model
J
Jiarui Yuan
MiniCPM Team
K
Kaihuo Zhang
MiniCPM Team
L
Lei Zhang
MiniCPM Team
L
Linyue Zhang
MiniCPM Team
X
Xueren Zhang
MiniCPM Team
Y
Yudi Zhang
MiniCPM Team
H
Hengyu Zhao
MiniCPM Team
Weilin Zhao
Weilin Zhao
Tsinghua University
Natural Language ProcessingArtificial IntelligenceEfficient LLM
W
Weilun Zhao
MiniCPM Team
Y
Yuanqian Zhao
MiniCPM Team
Z
Zhi Zheng
MiniCPM Team
G
Ge Zhou
MiniCPM Team
J
Jie Zhou
MiniCPM Team
W
Wei Zhou
MiniCPM Team
Z
Zihan Zhou
MiniCPM Team
Z
Zixuan Zhou
MiniCPM Team
Z
Zhiyuan Liu
MiniCPM Team
G
Guoyang Zeng
MiniCPM Team
Chao Jia
Chao Jia
Google Deepmind
Deep LearningComputer Vision
D
Dahai Li
MiniCPM Team
Maosong Sun
Maosong Sun
Professor of Computer Science and Technology, Tsinghua University
Natural Language ProcessingArtificial IntelligenceSocial Computing