ERNIE 5.0 Technical Report

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes ERNIE 5.0, the first trillion-parameter native autoregressive multimodal foundation model capable of unified processing of text, images, video, and audio. To address the challenge of efficient deployment under resource constraints, the model employs an ultra-sparse mixture-of-experts (MoE) architecture with a modality-agnostic expert routing mechanism and is trained from scratch using a unified “next group of tokens” prediction objective. A novel elastic training paradigm is introduced, enabling the simultaneous learning of a family of prunable submodels within a single pretraining run, with dynamic adjustment of depth, expert capacity, and sparsity. This approach systematically resolves the stability and efficiency challenges of multimodal reinforcement learning under ultra-sparse MoE settings, achieving balanced and state-of-the-art performance across both multimodal understanding and generation tasks.

Technology Category

Application Category

📝 Abstract
In this report, we introduce ERNIE 5.0, a natively autoregressive foundation model desinged for unified multimodal understanding and generation across text, image, video, and audio. All modalities are trained from scratch under a unified next-group-of-tokens prediction objective, based on an ultra-sparse mixture-of-experts (MoE) architecture with modality-agnostic expert routing. To address practical challenges in large-scale deployment under diverse resource constraints, ERNIE 5.0 adopts a novel elastic training paradigm. Within a single pre-training run, the model learns a family of sub-models with varying depths, expert capacities, and routing sparsity, enabling flexible trade-offs among performance, model size, and inference latency in memory- or time-constrained scenarios. Moreover, we systematically address the challenges of scaling reinforcement learning to unified foundation models, thereby guaranteeing efficient and stable post-training under ultra-sparse MoE architectures and diverse multimodal settings. Extensive experiments demonstrate that ERNIE 5.0 achieves strong and balanced performance across multiple modalities. To the best of our knowledge, among publicly disclosed models, ERNIE 5.0 represents the first production-scale realization of a trillion-parameter unified autoregressive model that supports both multimodal understanding and generation. To facilitate further research, we present detailed visualizations of modality-agnostic expert routing in the unified model, alongside comprehensive empirical analysis of elastic training, aiming to offer profound insights to the community.
Problem

Research questions and friction points this paper is trying to address.

multimodal foundation model
autoregressive modeling
mixture-of-experts
elastic training
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

unified multimodal autoregressive model
ultra-sparse mixture-of-experts
modality-agnostic expert routing
elastic training paradigm
reinforcement learning scaling
🔎 Similar Papers
No similar papers found.
Haifeng Wang
Haifeng Wang
Baidu
NLPMTSearchSpeechData Mining
H
Hua Wu
Baidu
Tian Wu
Tian Wu
Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences
Big DataData AnalysisForecastingMachine LearningEnergy Economics
Yu Sun
Yu Sun
Baidu
Natural Language ProcessingDeep Learning
Jing Liu
Jing Liu
Baidu Inc.
Large Language ModelInformation RetrievalAgents
Dianhai Yu
Dianhai Yu
Baidu
Deep LearningNatural Language ProcessingMachine LearningArtificial intelligence
Yanjun Ma
Yanjun Ma
Researcher at Baidu
Natural Language ProcessingMachine TranslationInformation Retrieval
J
Jingzhou He
Baidu
Z
Zhongjun He
Baidu
Dou Hong
Dou Hong
Xi'an Jiaotong-Liverpool University
Photovoltaic
Q
Qiwen Liu
Baidu
Shuohuan Wang
Shuohuan Wang
Baidu
Natural Language ProcessingDeep Learning
Junyuan Shang
Junyuan Shang
Baidu NLP
Deep LearningNatural Language ProcessingHealthcare
Zhenyu Zhang
Zhenyu Zhang
Baidu Inc.
Natural Language ProcessingLarge Language ModelMultimodal Language Model
Y
Yuchen Ding
Baidu
J
Jinle Zeng
Baidu
J
Jiabin Yang
Baidu
L
Liang Shen
Baidu
R
Ruibiao Chen
Baidu
W
Weichong Yin
Baidu
S
Siyu Ding
Baidu
Dai Dai
Dai Dai
Baidu
Natural Language ProcessingNatural Language UnderstandingInformation ExtractionText MiningSentiment Analysis
Shikun Feng
Shikun Feng
Baidu
nlp
Siqi Bao
Siqi Bao
Baidu
Natural Language ProcessingMedical Image Analysis
B
Bolei He
Baidu
Yan Chen
Yan Chen
Huawei Technologies
Wireless communications
Z
Z. Jiao
Baidu
R
Ruiqing Zhang
Baidu
Zeyu Chen
Zeyu Chen
Peking University, School of Basic Medical Sciences
Q
Qingqing Dang
Baidu
K
Kaipeng Deng
Baidu
J
Jiajun Jiang
Baidu
E
Enlei Gong
Baidu
G
Guoxia Wang
Baidu
Y
Yan-Hua Sha
Baidu
Yi Liu
Yi Liu
Baidu Inc.
CVLLMVLM
Y
Yehan Zheng
Baidu
W
Weijiang Xu
Baidu
Jiaxiang Liu
Jiaxiang Liu
Zhejiang University
Multimodal FusionMedical Image Analysis
Z
Zengfeng Zeng
Baidu
Y
Yingqi Qu
Baidu
Zhongli Li
Zhongli Li
Baidu Inc.
natural language processing
Z
Zhengkun Zhang
Baidu
X
Xiyang Wang
Baidu
Z
Zixiang Xu
Baidu
X
Xinchao Xu
Baidu
Zhengjie Huang
Zhengjie Huang
Baidu Inc
Vision Language ModelLarge Language ModelsGraph Neural NetworkNatural Language Processing
Dong Wang
Dong Wang
meituan
artificial intelligencecomputer visionrobotics
B
Bingjin Chen
Baidu
Yue Chang
Yue Chang
University of Toronto
Computer Graphics
X
Xing Yuan
Baidu
S
Shiwei Huang
Baidu
Q
Qiao Zhao
Baidu
X
Xinzhe Ding
Baidu
S
Shuangshuang Qiao
Baidu
B
B. Yang
Baidu
B
Bihong Tang
Baidu
B
Bin Li
Baidu
B
Bingquan Wang
Baidu
B
Binhan Tang
Baidu
B
Binxiong Zheng
Baidu
Bo Cui
Bo Cui
Eastern Institute of Technology, Ningbo
NanofabricationMEMSelectron beam and nanoimprint lithography
B
Bo Ke
Baidu
B
Bo Zhang
Baidu
B
Bo Zhang
Baidu
B
Boyan Zhang
Baidu
B
Boyang Liu
Baidu
C
Caiji Zhang
Baidu
C
Can Li
Baidu
Chang Xu
Chang Xu
Senior Researcher, Microsoft Research Asia
Machine LearningTime-series AnalysisGenerative ModelingAI for Healthcare
C
Chao Pang
Baidu
C
Chao Zhang
Baidu
C
Chaoyi Yuan
Baidu
C
Chen Chen
Baidu
Cheng Cui
Cheng Cui
BUAA
deep learningnetwork designOCRmllm
C
Chenlin Yin
Baidu
C
Chun Gan
Baidu
C
Chunguang Chai
Baidu
C
Chuyu Fang
Baidu
C
Cuiyun Han
Baidu
D
Dan Zhang
Baidu
D
Danlei Feng
Baidu
D
Danxiang Zhu
Baidu
D
Dong Sun
Baidu
Dongbo Li
Dongbo Li
Harbin Institute of Technology
Satellite networkintelligent network
D
Dongdong Li
Baidu
D
Dongdong Liu
Baidu
D
Dongxue Liu
Baidu
F
Fan Ding
Baidu
F
Fan Hu
Baidu
F
Fan Li
Baidu
F
Fan Mo
Baidu
F
Feisheng Wu
Baidu
F
Fengwei Liu
Baidu
G
Gangqiang Hu
Baidu
G
Gaofeng Lu
Baidu
G
Gaopeng Yong
Baidu
G
Gexiao Tian
Baidu
G
Guanzhong Wang
Baidu
G
Guangchen Ni
Baidu
G
Guangshuo Wu
Baidu
G
Guanzhong Wang
Baidu
G
Guihua Liu
Baidu
G
Guishun Li
Baidu
H
Haibin Li
Baidu
H
Hai-Yong Liang
Baidu
H
Haipeng Ming
Baidu
H
Haisu Wang
Baidu
H
Haiyang Lu
Baidu
H
Haiye Lin
Baidu
H
Han Zhou
Baidu
H
Hangting Lou
Baidu
H
Hanzhi Zhang
Baidu
H
Hao Chen
Baidu
Hao Du
Hao Du
ByteDance
Computer VisionMachine Learning
H
Hao Liu
Baidu
H
Hao Zhou
Baidu
H
Haochen Jiang
Baidu
H
Haodong Tian
Baidu
H
Hongya Wang
Baidu
Hao Geng
Hao Geng
Harvard University
Theoretical Physics
H
Heju Yin
Baidu
H
Hong Chen
Baidu
H
Hongchen Xue
Baidu
H
Hongen Liu
Baidu
H
Honggeng Zhang
Baidu
H
Hongji Xu
Baidu
H
Hongwei Chen
Baidu
Hongyan Zhang
Hongyan Zhang
Full Professor, China University of Geosciences (Wuhan)
High-Dimensional Data ProcessingRemote SensingAgricultural MonitoringArtificial Intelligence
H
Hongyuan Zhang
Baidu
H
Hua Lu
Baidu
Huan Chen
Huan Chen
Shunfeng Technology Company Limited
Artificial IntelligenceFormal Methods
H
Huan Wang
Baidu
Huang He
Huang He
Baidu
Natural Language Processing
Hui Liu
Hui Liu
Amazon
Natural Language ProcessingLarge Language ModelsArtificial Intelligence
Hui Zhong
Hui Zhong
The Hong Kong University of Science and Technology (Guangzhou)
data miningurban sciencesustainable transportation
H
H. Ruan
Baidu
J
Jiafeng Lu
Baidu
J
Jiage Liang
Baidu
J
Jiahao Hu
Baidu
J
Jiajie Yang
Baidu
J
Jialin Li
Baidu
Jian Chen
Jian Chen
Alibaba Group
processor architectureperformance modelingworkload characterization
Jian Wu
Jian Wu
Unknown affiliation
Music Generation
J
Jianfeng Yang
Baidu
J
Jian-Hui Jiang
Baidu
J
Jianhua Wang
Baidu
J
Jianye Chen
Baidu
J
Jiaodi Liu
Baidu
J
Jiarui Zhou
Baidu
J
Jiawei Lv
Baidu
J
Jiaxin Zhou
Baidu
Jiaxuan Liu
Jiaxuan Liu
University of Science and Technology of China
Text-to-SpeechSpeech LLMAGI
J
Jie Han
Baidu
J
Jie Sun
Baidu
J
Jiefan Fang
Baidu
J
Jihan Liu
Baidu
J
Jihua Liu
Baidu
Jing Hu
Jing Hu
Associate professor, School of Computer Science and Engineering, Xi'an University of Technology
hyperspectral image processing
J
Jing Qian
Baidu
J
Jing Yan
Baidu
J
Jingdong Du
Baidu
J
Jingdong Wang
Baidu
Jingjing Wu
Jingjing Wu
Vis, Baidu inc
CVMLLM
J
Jingyong Li
Baidu
J
Jinheng Wang
Baidu
Jinjin Li
Jinjin Li
Tsinghua university
frictionsuperlubricitynanotribologyinterface
J
Jinliang Lu
Baidu
J
Jinlin Yu
Baidu
J
Jinnan Liu
Baidu
J
Jixiang Feng
Baidu
J
Jiyi Huang
Baidu
Jiyuan Zhang
Jiyuan Zhang
Peking University
Jun Liang
Jun Liang
Cardiff University
J
J. Xia
Baidu
J
Jun Yu
Baidu
J
Junda Chen
Baidu
J
Junhao Feng
Baidu
J
Junhong Xiang
Baidu
J
Junliang Li
Baidu
Kai Liu
Kai Liu
Unknown affiliation
K
Kailun Chen
Baidu
K
Kairan Su
Baidu
K
Kang Hu
Baidu
K
Kangkang Zhou
Baidu
K
Ke Chen
Baidu
Ke Wei
Ke Wei
Fudan University
high dimensional signal processing and data analysisreinforcement learningnonconvex optimization
Kui Huang
Kui Huang
baidu
K
Kun Wu
Baidu
K
Kunbin Chen
Baidu
L
Lei Han
Baidu
Lei Sun
Lei Sun
Microsoft Resarch Asia
OCRdocument understandingtext detection
L
Lei Wen
Baidu
Linghui Meng
Linghui Meng
Institute of Automation, Chinese Academy of Sciences, China
Reinforcement LearningAutomatic Speech Recognition
L
Linhao Yu
Baidu
L
Liping Ouyang
Baidu
L
Liwen Zhang
Baidu
L
Longbin Ji
Baidu
L
Longzhi Wang
Baidu
Meng Sun
Meng Sun
Professor, School of Mathematical Science, Peking University
software theoryformal methodscyber-physical systemscoalgebra theorytrustworthy AI
M
Meng Tian
Baidu
M
Mengfei Li
Baidu
M
Mengqi Zeng
Baidu
Mengyu Zhang
Mengyu Zhang
University of Sheffield
Financial innovationFinancial behaviourFinancial market
M
Ming Hong
Baidu
M
Mingcheng Zhou
Baidu
M
Mingming Huang
Baidu
M
Mingxin Chen
Baidu
M
Mingzhu Cai
Baidu
N
Naibin Gu
Baidu
N
Nemin Qiu
Baidu
Nian Wang
Nian Wang
UT Southwestern Medical Center
Brain MRIKnee MRIQSMDiffusion MRITractography
P
Peng Qiu
Baidu
P
Pengbo Zhao
Baidu
Peng Zou
Peng Zou
Unknown affiliation
Q
Qi Wang
Baidu
Q
Qi Xin
Baidu
Q
Qian Wang
Baidu
Q
Qiang Zhu
Baidu
Q
Qi-Zhi Luo
Baidu
Q
Qianwei Yang
Baidu
Q
Qianyue He
Baidu
Q
Qifei Wu
Baidu
Q
Qinrui Li
Baidu
Q
Qiwen Bao
Baidu
Q
Quan Zhang
Baidu
Q
Quanxiang Liu
Baidu
Qunyi Xie
Qunyi Xie
Baidu VIS
OCR、MLLM
R
Rong-Rong Zhan
Baidu
R
Rufeng Dai
Baidu
R
Rui Peng
Baidu
R
Ruian Liu
Baidu
R
Ruihao Xu
Baidu
R
Ruijie Wang
Baidu
R
Ruixi Zhang
Baidu
R
Ruixuan Liu
Baidu
R
Runsheng Shi
Baidu
R
Ruting Wang
Baidu
S
Senbo Kang
Baidu
S
S. Lu
Baidu
S
Shaofei Yu
Baidu
S
Shaotian Gong
Baidu
S
Shenwei Hu
Baidu
S
Shifeng Zheng
Baidu
S
Shihao Guo
Baidu
S
Shilong Fan
Baidu
S
Shiqin Liu
Baidu
S
Shiwei Gu
Baidu
S
Shixi Zhang
Baidu
S
Shuai Yao
Baidu
Shuang Zhang
Shuang Zhang
Chair Professor, University of Hong Kong;
metamaterialstopological photonicsmetasurfacesplasmonicsnonlinear optics
S
Shuang Liu
Baidu
S
Shuhao Liang
Baidu
S
Shuwei He
Baidu
Shuwen Yang
Shuwen Yang
East China Norm University
Visual question answeringvisual reasoning
S
Sijun He
Baidu
S
Siming Dai
Baidu
S
Siming Wu
Baidu
S
Si-Xiu Long
Baidu
S
Songhe Deng
Baidu
S
Suhui Dong
Baidu
S
Suyin Liang
Baidu
T
Teng Hu
Baidu
T
Tianchan Xu
Baidu
T
Tianliang Lv
Baidu
Tianmeng Yang
Tianmeng Yang
Baidu ERNIE, Peking University
LLMRLMachine LearningData Mining
Tianyi Wei
Tianyi Wei
Research Fellow, MMLAB@NTU
Generative AI
T
Tiezhu Gao
Baidu
T
Ting Sun
Baidu
T
Ting Zhang
Baidu
T
Tingdan Luo
Baidu
Wei He
Wei He
Baidu
Natural Language Processing
W
Wei Luan
Baidu
Wei Yin
Wei Yin
Staff Research Scientist, Horizon Robotics
World ModelGenerative AIPhysical AI
W
Wei Zhang
Baidu
Wei Zhou
Wei Zhou
Meta, GenAI
Multimodal LLMspeech and language processing
W
Weibao Gong
Baidu
W
Weibin Li
Baidu
W
Weicheng Huang
Baidu
W
Weichong Dang
Baidu
W
Weiguo Zhu
Baidu
W
Weilong Zhang
Baidu
W
Weiqi Tan
Baidu
W
Wen Huang
Baidu
W
Wenbin Chang
Baidu
Wenjing Du
Wenjing Du
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
machine learningdeep learningsignal processingmedical robot
W
Wenlong Miao
Baidu
W
Wenpei Luo
Baidu
W
Wenquan Wu
Baidu
Xi Shi
Xi Shi
KU Leuven
X
Xi Zhao
Baidu
X
Xiang Gao
Baidu
X
Xiangguo Zhang
Baidu
X
Xiangrui Yu
Baidu
X
Xiangsen Wang
Baidu
X
Xiangzhe Wang
Baidu
X
Xianlong Luo
Baidu
X
Xianying Ma
Baidu
Xiao Tan
Xiao Tan
Baidu
AI CV ML
X
Xiaocong Lin
Baidu
X
Xiaofei Wang
Baidu
X
Xiaofeng Peng
Baidu
X
Xiaofeng Wu
Baidu
Xiaojian Xu
Xiaojian Xu
AI Scientist @ GE HealthCare; CS PhD @ Washington University in St. Louis
Computational ImagingDeep LearningOptimizationComputer Vision
X
Xiaolan Yuan
Baidu
X
Xiaopeng Cui
Baidu
Xiaotian Han
Xiaotian Han
Research Scientist, OpenAI
Machine learningComputer VisionMultimodalGenAILLM
X
Xiaoxiong Liu
Baidu
X
Xiaoxu Fei
Baidu
X
Xiaoxuan Wu
Baidu
X
Xiaoyu Wang
Baidu
X
Xiaoyu Zhang
Baidu
X
Xinjie Sun
Baidu
X
Xin Wang
Baidu
X
Xinhui Huang
Baidu
X
Xinming Zhu
Baidu
Xintong Yu
Xintong Yu
Tsinghua University
Vision-Language Multimodal LearningDialogue System
Xinyi Xu
Xinyi Xu
Meta
data centric-machine learningfederated Learningmulti-agent systemscooperative game theory
X
Xinyu Wang
Baidu
Xiuxian Li
Xiuxian Li
Professor, Tongji University
Distributed controlOptimizationGame theoryOnline learningUnmanned systems
X
XuanShi Zhu
Baidu
Xue Xu
Xue Xu
Harvard Medical School
Bioinformaticscancer immunotherapyhuman genetics analysis
X
Xueying Lv
Baidu
Xuhong Li
Xuhong Li
Baidu Inc
Explainable AITransfer Learning
X
Xulong Wei
Baidu
X
Xuyi Chen
Baidu
Y
Yabing Shi
Baidu
Y
Yafeng Wang
Baidu
Y
Yamei Li
Baidu
Y
Yan Liu
Baidu
Y
Yanfu Cheng
Baidu
Y
Yang Gao
Baidu
Y
Yang Liang
Baidu
Y
Yang Wang
Baidu
Y
Yang Yang
Baidu
Y
Yanlong Liu
Baidu
Y
Ya-Na Fu
Baidu
Y
Yanpeng Wang
Baidu
Y
Yanzheng Lin
Baidu
Y
Yao Chen
Baidu
Y
Yaozong Shen
Baidu
Y
Yaqian Han
Baidu
Y
Yehua Yang
Baidu
Yekun Chai
Yekun Chai
Baidu
natural language processingmachine learning
Y
Yesong Wang
Baidu
Y
Yinnan Song
Baidu
Y
Yichen Zhang
Baidu
Y
Yifei Wang
Baidu
Yifeng Guo
Yifeng Guo
St. Jude Children's Research Hospital; HKU
Y
Yifeng Kou
Baidu
Y
Yilong Chen
Baidu
Y
Yilong Guo
Baidu
Y
Yiming Wang
Baidu
Y
Ying Chen
Baidu
Y
Ying Wang
Baidu
Y
Yingsheng Wu
Baidu
Y
Yingzhan Lin
Baidu
Y
Yin-Tang Yang
Baidu
Y
Yiran Xing
Baidu
Y
Yishu Lei
Baidu
Y
Yixiang Tu
Baidu
Y
Yiyan Chen
Baidu
Y
Yong Zhang
Baidu
Y
Yonghua Li
Baidu
Yongqiang Ma
Yongqiang Ma
Wuhan University
Scientific Information MiningLarge Language ModelsAI for Science
Y
Yongxing Dai
Baidu
Yongyue Zhang
Yongyue Zhang
Nanyang Technological University
Y
Yu Ran
Baidu
Y
Yu-Wen Michael Zhang
Baidu
Y
Yuang Liu
Baidu
Y
Yuanle Liu
Baidu
Y
Yuan-Jie Zhou
Baidu
Y
Yubo Zhang
Baidu
Y
Yuchen Han
Baidu
Yucheng Wang
Yucheng Wang
ETH Zürich
Multimodal LLMSpeech UnderstandingHuman-Computer Interaction
Y
Yude Gao
Baidu
Y
Yuedong Luo
Baidu
Y
Yuehu Dong
Baidu
Yufeng Hu
Yufeng Hu
Zhejiang University
Blockchain SecurityProgram AnalysisLLM
Y
Yuhui Cao
Baidu
Y
Yuhui Yun
Baidu
Yukun Chen
Yukun Chen
Pieces Technologies Inc.
Natural Language Processing
Y
Yukun Gao
Baidu
Y
Yukun Li
Baidu
Y
Yumeng Zhang
Baidu
Y
Yun Fan
Baidu
Y
Yuntian Ma
Baidu
Y
Yunfei Zhang
Baidu
Y
Yunshen Xie
Baidu
Y
Yuping Xu
Baidu
Y
Yuqin Zhang
Baidu
Y
Yuqing Liu
Baidu
Y
Yurui Li
Baidu
Y
Yuwen Wang
Baidu
Y
Yuxiang Lu
Baidu
Z
Zefeng Cai
Baidu
Z
Ze-Xuan Zhao
Baidu
Z
Zelun Zhang
Baidu
Z
Zenan Lin
Baidu
Z
Zezhao Dong
Baidu
Z
Zhaowu Pan
Baidu
Z
Zhaoyu Liu
Baidu
Z
Zhensheng Dong
Baidu
Z
Zhe Zhang
Baidu
Z
Zhen Zhang
Baidu
Z
Zhengfan Wu
Baidu
Z
Zhengrui Wei
Baidu
Z
Z. Ning
Baidu
Z
Zhenxing Li
Baidu
Z
Zhenyun Li
Baidu
Z
Zhenyu Qian
Baidu
Z
Zhenyun Li
Baidu
Z
Zhi Li
Baidu
Z
Zhichao Chen
Baidu
Z
Zhicheng Dong
Baidu
Z
Zhida Feng
Baidu
Z
Zhifan Feng
Baidu
Z
Zhihao Deng
Baidu
Z
Zhijin Yu
Baidu
Z
Zhiyang Chen
Baidu
Z
Zhonghui Zheng
Baidu
Z
Zhuangzhuang Guo
Baidu
Z
Zhujun Zhang
Baidu
Zhuo Sun
Zhuo Sun
Australian National University
Wireless Comunications
Zichang Liu
Zichang Liu
Rice University
Zihan Lin
Zihan Lin
Researcher, Xiaohongshu.
Recommender System
Z
Zihao Huang
Baidu
Z
Zihe Zhu
Baidu
Z
Ziheng Zhao
Baidu
Z
Ziping Chen
Baidu
Z
Zixuan Zhu
Baidu
Ziyang Xu
Ziyang Xu
The Chinese University of Hong Kong
AI for ScienceBioinformaticsMedical Image Processing
Z
Ziyi Liang
Baidu
Z
Ziyuan Gao
Baidu