BlueLM-2.5-3B Technical Report

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of deploying multimodal large language models (MLLMs) on edge devices—namely, excessive parameter count, high inference overhead, and limited reasoning capability—this paper introduces BlueLM-2.5-3B, a compact MLLM with only 290M parameters. It is the first 3B-scale MLLM to support explicit, controllable “think/non-think” dual-mode inference, enabling dynamic allocation of reasoning token budgets. Our method leverages diverse data construction, critical sample resampling, hybrid heterogeneous reinforcement learning, and an efficient training architecture—achieving strong performance with significantly reduced training data. Specifically, BlueLM-2.5-3B matches Qwen3-4B on pure-text tasks, attains 95% of Kimi-VL-A3B-16B’s average multimodal performance, and outperforms Qwen2.5-VL-3B in non-think mode. The model thus achieves a favorable trade-off among edge-deployment efficiency, general multimodal understanding, and structured reasoning capability.

Technology Category

Application Category

📝 Abstract
We present BlueLM-2.5-3B, a compact and unified dense Multimodal Large Language Model (MLLM) designed for efficient edge-device deployment, offering strong general-purpose and reasoning capabilities. To the best of our knowledge, this is the first 3B-scale MLLM to support both thinking and non-thinking modes, while also enabling explicit control over thinking token budget. BlueLM-2.5-3B is developed through diversified data curation, key data resampling, hybrid heterogeneous reinforcement learning, and a high-performance training infrastructure. Our model achieves superior multimodal capacity while preserving competitive pure-text performance with only 2.9 billion parameters. We conduct comprehensive evaluations across a broad range of multimodal and text-only benchmarks. In thinking mode, BlueLM-2.5-3B achieves comparable performance to Qwen3-4B on text-only benchmarks, and trails the larger Kimi-VL-A3B-16B by only about 5% on average across multimodal evaluations. In non-thinking mode, it outperforms Qwen2.5-VL-3B on the majority of multimodal benchmarks. Additionally, BlueLM-2.5-3B exhibits exceptional data efficiency. All of the aforementioned performance is achieved with substantially less total training data than Qwen2.5-VL-3B and Qwen3-4B. We hope our work contributes to the advancement of high-performance, on-device MLLMs and provides meaningful insights to the research community.
Problem

Research questions and friction points this paper is trying to address.

Develop a compact multimodal LLM for edge devices
Enable thinking and non-thinking modes with token control
Achieve high performance with minimal training data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compact 3B-scale MLLM for edge devices
Hybrid heterogeneous reinforcement learning
Explicit control over thinking token budget
🔎 Similar Papers
No similar papers found.
B
Baojiao Xiong
vivo AI Lab
B
Boheng Chen
vivo AI Lab
C
Chengzhi Wang
vivo AI Lab
D
Daxiong Luo
vivo AI Lab
D
Dongsheng Xu
vivo AI Lab
Dongyang Liu
Dongyang Liu
MMLab CUHK
Image/Video GenerationLLMsVLMs
F
Fan Yang
vivo AI Lab
F
Fangyuan Li
vivo AI Lab
Fei Teng
Fei Teng
Reader in Intelligent Energy Systems, Imperial College London
Stability-constrained OptimisationCyber-resilient System OperationData Privacy and Trading
F
Feng Wang
vivo AI Lab
F
Fukang Qin
vivo AI Lab
F
Fuquan Peng
vivo AI Lab
G
Guanxin Tan
vivo AI Lab
G
Guozhi Wang
vivo AI Lab
Haibo Yu
Haibo Yu
vivo AI Lab
H
Haohao Gao
vivo AI Lab
Heng Liu
Heng Liu
Guangxi Minzu University
adaptive fuzzy controlfractional-order systemnonlinear systemrobust controlneural network
H
Hongbo Yang
vivo AI Lab
H
Hongjian Zou
vivo AI Lab
H
Houzheng Shen
vivo AI Lab
H
Hu Meng
vivo AI Lab
H
Huan Li
vivo AI Lab
H
Hui Tan
vivo AI Lab
Jiali Chen
Jiali Chen
Apple
Machine Learning
J
Jianzhao Chen
vivo AI Lab