M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal large language models (MLLMs) generally lack dynamic spatial interaction capabilities and general-purpose spatial reasoning. To address this, we propose a unified spatial-general joint reasoning framework. Our method introduces three core innovations: (1) construction of a high-quality, dynamically annotated spatial interaction dataset; (2) a staged optimization training paradigm integrating cold-start fine-tuning, multi-task learning, and verifiable reward-guided reinforcement learning (RLVR); and (3) task-specific reward signals to precisely steer spatial relation modeling. The approach significantly enhances model robustness in understanding and reasoning about complex, dynamic spatial scenarios. Evaluated on eight diverse benchmarks—including both general multimodal reasoning and spatial reasoning tasks—our framework achieves state-of-the-art performance across all metrics: average improvement of 4.2% on general reasoning tasks and up to 12.7% accuracy gain on spatial interaction tasks.

Technology Category

Application Category

📝 Abstract
Recent advancements in Multimodal Large Language Models (MLLMs), particularly through Reinforcement Learning with Verifiable Rewards (RLVR), have significantly enhanced their reasoning abilities. However, a critical gap persists: these models struggle with dynamic spatial interactions, a capability essential for real-world applications. To bridge this gap, we introduce M2-Reasoning-7B, a model designed to excel in both general and spatial reasoning. Our approach integrates two key innovations: (1) a novel data pipeline that generates 294.2K high-quality data samples (168K for cold-start fine-tuning and 126.2K for RLVR), which feature logically coherent reasoning trajectories and have undergone comprehensive assessment; and (2) a dynamic multi-task training strategy with step-wise optimization to mitigate conflicts between data, and task-specific rewards for delivering tailored incentive signals. This combination of curated data and advanced training allows M2-Reasoning-7B to set a new state-of-the-art (SOTA) across 8 benchmarks, showcasing superior performance in both general and spatial reasoning domains.
Problem

Research questions and friction points this paper is trying to address.

MLLMs lack dynamic spatial interaction capabilities
Need unified general and spatial reasoning in MLLMs
Existing models struggle with real-world spatial applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates 294.2K high-quality data samples
Uses dynamic multi-task training strategy
Integrates step-wise optimization and task-specific rewards