URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics

📅 2025-01-08

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Multimodal mathematical reasoning faces challenges of scarce chain-of-thought (CoT) data, weak model verification capability, and poor generalization. Method: We introduce MMathCoT-1M—the first high-quality multimodal CoT instruction-tuning dataset—and propose a two-stage training framework: (1) training the reasoning model URSA-7B, followed by (2) training the verifier URSA-RM-7B on DualMath-1.1M, a large-scale dataset of self-generated multimodal CoT traces from URSA-7B. Crucially, we design a novel three-module CoT synthesis strategy to bridge the gap between CoT reasoning and robust verification. Contribution/Results: URSA-7B achieves state-of-the-art performance on multimodal mathematical benchmarks; URSA-RM-7B significantly improves test-time inference reliability and out-of-distribution (OOD) verification capability. All models, datasets, and code are publicly released.

Technology Category

Application Category

📝 Abstract

Chain-of-thought (CoT) reasoning has been widely applied in the mathematical reasoning of Large Language Models (LLMs). Recently, the introduction of derivative process supervision on CoT trajectories has sparked discussions on enhancing scaling capabilities during test time, thereby boosting the potential of these models. However, in multimodal mathematical reasoning, the scarcity of high-quality CoT training data has hindered existing models from achieving high-precision CoT reasoning and has limited the realization of reasoning potential during test time. In this work, we propose a three-module synthesis strategy that integrates CoT distillation, trajectory-format rewriting, and format unification. It results in a high-quality CoT reasoning instruction fine-tuning dataset in multimodal mathematics, MMathCoT-1M. We comprehensively validate the state-of-the-art (SOTA) performance of the trained URSA-7B model on multiple multimodal mathematical benchmarks. For test-time scaling, we introduce a data synthesis strategy that automatically generates process annotation datasets, known as DualMath-1.1M, focusing on both interpretation and logic. By further training URSA-7B on DualMath-1.1M, we transition from CoT reasoning capabilities to robust supervision abilities. The trained URSA-RM-7B acts as a verifier, effectively enhancing the performance of URSA-7B at test time. URSA-RM-7B also demonstrates excellent out-of-distribution (OOD) verifying capabilities, showcasing its generalization. Model weights, training data and code will be open-sourced.

Problem

Research questions and friction points this paper is trying to address.

Multimodal Mathematical Problems

Large Language Models

Data Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought Reasoning

Multi-modal Math Problem Solving

Unified Thinking Approaches

🔎 Similar Papers

Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models