🤖 AI Summary
To address shallow semantic modeling, weak logical robustness, and susceptibility to misleading rationales in multimodal large language models (MLLMs) during complex reasoning, this paper proposes the MIND reasoning framework—the first to instantiate a three-level cognitive mechanism: “understand → reflect → revise.” Methodologically, it introduces: (1) rationale-augmented discriminative reasoning; (2) progressive two-stage correction learning (P2CL); and (3) multi-causal contrastive alignment (MCA), which jointly enables correct semantic aggregation and erroneous boundary separation. MIND unifies rationale generation, discriminative training, and multi-causal modeling within a single architecture. Evaluated on diverse public benchmarks spanning scientific, commonsense, and mathematical reasoning, it achieves state-of-the-art performance. Crucially, MIND significantly enhances MLLMs’ logical robustness against adversarial rationales and their capacity for multi-causal inference.
📝 Abstract
Recently, multimodal large language models (MLLMs) have been widely applied to reasoning tasks. However, they suffer from limited multi-rationale semantic modeling, insufficient logical robustness, and are susceptible to misleading interpretations in complex scenarios. Therefore, we propose a Multi-rationale INtegrated Discriminative (MIND) reasoning framework, which is designed to endow MLLMs with human-like cognitive abilities of "Understand -> Rethink -> Correct", and achieves a paradigm evolution from passive imitation-based reasoning to active discriminative reasoning. Specifically, we introduce a Rationale Augmentation and Discrimination (RAD) paradigm, which automatically and efficiently expands existing datasets by generating diverse rationales, providing a unified and extensible data foundation. Meanwhile, we design a Progressive Two-stage Correction Learning (P2CL) strategy. The first phase enhances multi-rationale positive learning, while the second phase enables active logic discrimination and correction. In addition, to mitigate representation entanglement in the multi-rationale semantic space, we propose a Multi-rationale Contrastive Alignment (MCA) optimization strategy, which achieves semantic aggregation of correct reasoning and boundary separation of incorrect reasoning. Extensive experiments demonstrate that the proposed MIND reasoning framework achieves state-of-the-art (SOTA) performance on multiple public datasets covering scientific, commonsense, and mathematical scenarios. It provides a new perspective for advancing MLLMs towards higher levels of cognitive intelligence. Our code is available at https://github.com/YuChuang1205/MIND