Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of large reasoning models in enterprise tasks caused by excessive or redundant reasoning. To mitigate this, the authors propose Yuan3.0 Flash, an open-source multimodal large language model built upon a Mixture-of-Experts architecture with 40 billion total parameters and 3.7 billion activated per inference. The model incorporates a novel reinforcement learning algorithm, Reflection-aware Adaptive Policy Optimization (RAPO), which dynamically regulates the reasoning process to suppress unnecessary computation. With only one-fourth to one-half of the average token usage, Yuan3.0 Flash achieves state-of-the-art performance on enterprise-oriented tasks such as retrieval-augmented generation, complex table understanding, and summarization, while maintaining competitive accuracy on mathematical and scientific reasoning benchmarks—effectively balancing computational efficiency with broad generalization capability.

Technology Category

Application Category

📝 Abstract
We introduce Yuan3.0 Flash, an open-source Mixture-of-Experts (MoE) MultiModal Large Language Model featuring 3.7B activated parameters and 40B total parameters, specifically designed to enhance performance on enterprise-oriented tasks while maintaining competitive capabilities on general-purpose tasks. To address the overthinking phenomenon commonly observed in Large Reasoning Models (LRMs), we propose Reflection-aware Adaptive Policy Optimization (RAPO), a novel RL training algorithm that effectively regulates overthinking behaviors. In enterprise-oriented tasks such as retrieval-augmented generation (RAG), complex table understanding, and summarization, Yuan3.0 Flash consistently achieves superior performance. Moreover, it also demonstrates strong reasoning capabilities in domains such as mathematics, science, etc., attaining accuracy comparable to frontier model while requiring only approximately 1/4 to 1/2 of the average tokens. Yuan3.0 Flash has been fully open-sourced to facilitate further research and real-world deployment: https://github.com/Yuan-lab-LLM/Yuan3.0.
Problem

Research questions and friction points this paper is trying to address.

overthinking
Large Reasoning Models
enterprise applications
Mixture-of-Experts
multimodal
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts
Reflection-aware Adaptive Policy Optimization
overthinking mitigation
enterprise-oriented multimodal LLM
retrieval-augmented generation
🔎 Similar Papers
No similar papers found.
S
Shawn Wu
YuanLab.ai
Sean Wang
Sean Wang
Southern Methodist University, Cox School of Business, Accounting
BiasDiscriminationCapital MarketsInformation ProcessingFinancial Analysts
L
Louie Li
YuanLab.ai
D
Darcy Chen
YuanLab.ai
A
Allen Wang
YuanLab.ai
J
Jiangang Luo
YuanLab.ai
X
Xudong Zhao
YuanLab.ai
J
Joseph Shen
YuanLab.ai
G
Gawain Ma
YuanLab.ai
J
Jasper Jia
YuanLab.ai
M
Marcus Mao
YuanLab.ai
C
Claire Wang
YuanLab.ai
H
Hunter He
YuanLab.ai
C
Carol Wang
YuanLab.ai
Z
Zera Zhang
YuanLab.ai
Jason Wang
Jason Wang
Harvard University
Machine Learning
C
Chonly Shen
YuanLab.ai
L
Leo Zhang
YuanLab.ai
L
Logan Chen
YuanLab.ai
Q
Qasim Meng
YuanLab.ai
J
James Gong
YuanLab.ai
D
Danied Zhao
YuanLab.ai
P
Penn Zheng
YuanLab.ai
O
Owen Zhu
YuanLab.ai
Tong Yu
Tong Yu
Adobe Research