Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the inefficiency of large reasoning models in enterprise tasks caused by excessive or redundant reasoning. To mitigate this, the authors propose Yuan3.0 Flash, an open-source multimodal large language model built upon a Mixture-of-Experts architecture with 40 billion total parameters and 3.7 billion activated per inference. The model incorporates a novel reinforcement learning algorithm, Reflection-aware Adaptive Policy Optimization (RAPO), which dynamically regulates the reasoning process to suppress unnecessary computation. With only one-fourth to one-half of the average token usage, Yuan3.0 Flash achieves state-of-the-art performance on enterprise-oriented tasks such as retrieval-augmented generation, complex table understanding, and summarization, while maintaining competitive accuracy on mathematical and scientific reasoning benchmarks—effectively balancing computational efficiency with broad generalization capability.

Technology Category

Application Category

📝 Abstract

We introduce Yuan3.0 Flash, an open-source Mixture-of-Experts (MoE) MultiModal Large Language Model featuring 3.7B activated parameters and 40B total parameters, specifically designed to enhance performance on enterprise-oriented tasks while maintaining competitive capabilities on general-purpose tasks. To address the overthinking phenomenon commonly observed in Large Reasoning Models (LRMs), we propose Reflection-aware Adaptive Policy Optimization (RAPO), a novel RL training algorithm that effectively regulates overthinking behaviors. In enterprise-oriented tasks such as retrieval-augmented generation (RAG), complex table understanding, and summarization, Yuan3.0 Flash consistently achieves superior performance. Moreover, it also demonstrates strong reasoning capabilities in domains such as mathematics, science, etc., attaining accuracy comparable to frontier model while requiring only approximately 1/4 to 1/2 of the average tokens. Yuan3.0 Flash has been fully open-sourced to facilitate further research and real-world deployment: https://github.com/Yuan-lab-LLM/Yuan3.0.

Problem

Research questions and friction points this paper is trying to address.

overthinking

Large Reasoning Models

enterprise applications

Mixture-of-Experts

multimodal

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts

Reflection-aware Adaptive Policy Optimization

overthinking mitigation