We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal large language models (MLLMs) exhibit significant limitations in complex mathematical reasoning, primarily due to the absence of knowledge-driven system design and model-centric data modeling. To address this, we propose MathBook: a principled framework featuring a five-layer structured mathematical knowledge hierarchy; the release of MathBook-Pro—a high-quality, difficulty-graded dataset covering 491 mathematical concepts; and a two-stage reinforcement learning paradigm integrating knowledge-guided chain-of-thought reasoning, average-reward optimization, and dynamic data scheduling for cross-difficulty progressive alignment. Evaluated on four established benchmarks and the newly introduced MathBookEval suite, our approach achieves substantial gains in mathematical reasoning performance, demonstrating strong generalization and robustness. Our core contribution lies in the first unified integration of structured knowledge modeling, model-centric data space construction, and reinforcement learning–based training within a multimodal mathematical reasoning system.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across various tasks, but still struggle with complex mathematical reasoning. Existing research primarily focuses on dataset construction and method optimization, often overlooking two critical aspects: comprehensive knowledge-driven design and model-centric data space modeling. In this paper, we introduce We-Math 2.0, a unified system that integrates a structured mathematical knowledge system, model-centric data space modeling, and a reinforcement learning (RL)-based training paradigm to comprehensively enhance the mathematical reasoning abilities of MLLMs. The key contributions of We-Math 2.0 are fourfold: (1) MathBook Knowledge System: We construct a five-level hierarchical system encompassing 491 knowledge points and 1,819 fundamental principles. (2) MathBook-Standard & Pro: We develop MathBook-Standard, a dataset that ensures broad conceptual coverage and flexibility through dual expansion. Additionally, we define a three-dimensional difficulty space and generate 7 progressive variants per problem to build MathBook-Pro, a challenging dataset for robust training. (3) MathBook-RL: We propose a two-stage RL framework comprising: (i) Cold-Start Fine-tuning, which aligns the model with knowledge-oriented chain-of-thought reasoning; and (ii) Progressive Alignment RL, leveraging average-reward learning and dynamic data scheduling to achieve progressive alignment across difficulty levels. (4) MathBookEval: We introduce a comprehensive benchmark covering all 491 knowledge points with diverse reasoning step distributions. Experimental results show that MathBook-RL performs competitively with existing baselines on four widely-used benchmarks and achieves strong results on MathBookEval, suggesting promising generalization in mathematical reasoning.
Problem

Research questions and friction points this paper is trying to address.

Enhancing MLLMs' complex mathematical reasoning abilities
Addressing gaps in knowledge-driven design and data modeling
Developing a unified system with structured knowledge and RL training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured hierarchical math knowledge system
Model-centric progressive difficulty datasets
Two-stage reinforcement learning training framework
🔎 Similar Papers
R
Runqi Qiao
BUPT
Q
Qiuna Tan
BUPT
Peiqing Yang
Peiqing Yang
Nanyang Technological University
Computer VisionImage ProcessingMachine Learning
Y
Yanzi Wang
Tsinghua University
X
Xiaowan Wang
BUPT
E
Enhui Wan
BUPT
S
Sitong Zhou
BUPT
Guanting Dong
Guanting Dong
Remin University of China
LLM Reasoning & AlignmentDeep Search AgentAgentic RL
Yuchen Zeng
Yuchen Zeng
Microsoft Research
Machine LearningArtificial IntelligenceAlgorithms
Y
Yida Xu
BUPT
J
Jie Wang
BUPT
Chong Sun
Chong Sun
Tencent WeChat
Computer Vision
C
Chen Li
WeChat Vision, Tencent Inc.
H
Honggang Zhang
BUPT