Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To address the pervasive hallucination problem in multimodal large language models (MLLMs), this paper proposes Symmetric Multimodal Preference Optimization (SMPO). Grounded in the Direct Preference Optimization (DPO) theoretical framework, SMPO introduces a novel symmetric preference learning paradigm: it leverages vision–language response pairs for direct preference supervision while incorporating a preference-margin consistency loss to jointly align visual understanding and linguistic generation. Unlike conventional loosely defined contrastive objectives, SMPO ensures both theoretical rigor and perceptual enhancement. Evaluated on five mainstream hallucination benchmarks, SMPO achieves significant reductions in hallucination rates. Experimental results demonstrate that symmetric supervision—combined with theoretically consistent modeling—substantially improves vision–language alignment, offering a principled and effective solution to hallucination mitigation in MLLMs.

Technology Category

Application Category

📝 Abstract

Direct Preference Optimization (DPO) has emerged as an effective approach for mitigating hallucination in Multimodal Large Language Models (MLLMs). Although existing methods have achieved significant progress by utilizing vision-oriented contrastive objectives for enhancing MLLMs' attention to visual inputs and hence reducing hallucination, they suffer from non-rigorous optimization objective function and indirect preference supervision. To address these limitations, we propose a Symmetric Multimodal Preference Optimization (SymMPO), which conducts symmetric preference learning with direct preference supervision (i.e., response pairs) for visual understanding enhancement, while maintaining rigorous theoretical alignment with standard DPO. In addition to conventional ordinal preference learning, SymMPO introduces a preference margin consistency loss to quantitatively regulate the preference gap between symmetric preference pairs. Comprehensive evaluation across five benchmarks demonstrate SymMPO's superior performance, validating its effectiveness in hallucination mitigation of MLLMs.

Problem

Research questions and friction points this paper is trying to address.

Reducing hallucination in Multimodal Large Language Models

Enhancing visual understanding with direct preference supervision

Ensuring rigorous theoretical alignment with standard DPO

Innovation

Methods, ideas, or system contributions that make the work stand out.

Symmetric Multimodal Preference Optimization (SymMPO)

Direct preference supervision for visual enhancement

Preference margin consistency loss regulation

🔎 Similar Papers

What to align in multimodal contrastive learning?