🤖 AI Summary
To address the pervasive hallucination problem in multimodal large language models (MLLMs), this paper proposes Symmetric Multimodal Preference Optimization (SMPO). Grounded in the Direct Preference Optimization (DPO) theoretical framework, SMPO introduces a novel symmetric preference learning paradigm: it leverages vision–language response pairs for direct preference supervision while incorporating a preference-margin consistency loss to jointly align visual understanding and linguistic generation. Unlike conventional loosely defined contrastive objectives, SMPO ensures both theoretical rigor and perceptual enhancement. Evaluated on five mainstream hallucination benchmarks, SMPO achieves significant reductions in hallucination rates. Experimental results demonstrate that symmetric supervision—combined with theoretically consistent modeling—substantially improves vision–language alignment, offering a principled and effective solution to hallucination mitigation in MLLMs.
📝 Abstract
Direct Preference Optimization (DPO) has emerged as an effective approach for mitigating hallucination in Multimodal Large Language Models (MLLMs). Although existing methods have achieved significant progress by utilizing vision-oriented contrastive objectives for enhancing MLLMs' attention to visual inputs and hence reducing hallucination, they suffer from non-rigorous optimization objective function and indirect preference supervision. To address these limitations, we propose a Symmetric Multimodal Preference Optimization (SymMPO), which conducts symmetric preference learning with direct preference supervision (i.e., response pairs) for visual understanding enhancement, while maintaining rigorous theoretical alignment with standard DPO. In addition to conventional ordinal preference learning, SymMPO introduces a preference margin consistency loss to quantitatively regulate the preference gap between symmetric preference pairs. Comprehensive evaluation across five benchmarks demonstrate SymMPO's superior performance, validating its effectiveness in hallucination mitigation of MLLMs.