MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address shallow semantic modeling, weak logical robustness, and susceptibility to misleading rationales in multimodal large language models (MLLMs) during complex reasoning, this paper proposes the MIND reasoning framework—the first to instantiate a three-level cognitive mechanism: “understand → reflect → revise.” Methodologically, it introduces: (1) rationale-augmented discriminative reasoning; (2) progressive two-stage correction learning (P2CL); and (3) multi-causal contrastive alignment (MCA), which jointly enables correct semantic aggregation and erroneous boundary separation. MIND unifies rationale generation, discriminative training, and multi-causal modeling within a single architecture. Evaluated on diverse public benchmarks spanning scientific, commonsense, and mathematical reasoning, it achieves state-of-the-art performance. Crucially, MIND significantly enhances MLLMs’ logical robustness against adversarial rationales and their capacity for multi-causal inference.

Technology Category

Application Category

📝 Abstract
Recently, multimodal large language models (MLLMs) have been widely applied to reasoning tasks. However, they suffer from limited multi-rationale semantic modeling, insufficient logical robustness, and are susceptible to misleading interpretations in complex scenarios. Therefore, we propose a Multi-rationale INtegrated Discriminative (MIND) reasoning framework, which is designed to endow MLLMs with human-like cognitive abilities of "Understand -> Rethink -> Correct", and achieves a paradigm evolution from passive imitation-based reasoning to active discriminative reasoning. Specifically, we introduce a Rationale Augmentation and Discrimination (RAD) paradigm, which automatically and efficiently expands existing datasets by generating diverse rationales, providing a unified and extensible data foundation. Meanwhile, we design a Progressive Two-stage Correction Learning (P2CL) strategy. The first phase enhances multi-rationale positive learning, while the second phase enables active logic discrimination and correction. In addition, to mitigate representation entanglement in the multi-rationale semantic space, we propose a Multi-rationale Contrastive Alignment (MCA) optimization strategy, which achieves semantic aggregation of correct reasoning and boundary separation of incorrect reasoning. Extensive experiments demonstrate that the proposed MIND reasoning framework achieves state-of-the-art (SOTA) performance on multiple public datasets covering scientific, commonsense, and mathematical scenarios. It provides a new perspective for advancing MLLMs towards higher levels of cognitive intelligence. Our code is available at https://github.com/YuChuang1205/MIND
Problem

Research questions and friction points this paper is trying to address.

Enhances multi-rationale semantic modeling in MLLMs
Improves logical robustness against misleading interpretations
Enables active discriminative reasoning over passive imitation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates diverse rationales to augment datasets automatically
Uses two-stage learning for positive reinforcement and error correction
Aligns correct reasoning and separates incorrect reasoning semantically
🔎 Similar Papers
No similar papers found.
C
Chuang Yu
Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences
J
Jinmiao Zhao
Shenyang Institute of Automation, Chinese Academy of Sciences
M
Mingxuan Zhao
HKUST(GZ)
Yunpeng Liu
Yunpeng Liu
Wuhan University of Technology
cement and concrete materials
X
Xiujun Shu
Tencent
Yuanhao Feng
Yuanhao Feng
University of Science and Technology of China
Wireless SensingEmbodied IntelligenceWearable Computing
B
Bo Wang
Tencent
Xiangyu Yue
Xiangyu Yue
The Chinese University of Hong Kong / UC Berkeley / Stanford University / NJU
Artificial IntelligenceComputer VisionMulti-modal Learning