DAMO: Data- and Model-aware Alignment of Multi-modal LLMs

📅 2025-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal large language models (MLLMs) exhibit imbalanced responses to easy versus hard samples during preference alignment: overfitting on easily distinguishable instances while underfitting on challenging ones. To address this, we propose a dynamic, joint optimization framework that simultaneously perceives data hardness and model response—introducing the first dual-driven DPO variant grounded in *data hardness awareness* and *model response awareness*. Our method quantifies image-text matching difficulty and integrates model output confidence to construct a difficulty-adaptive dynamic weighting mechanism, enabling fine-grained alignment. Evaluated across five benchmarks, our approach significantly enhances reliability and generalization: DAMO-7B reduces response-level and mention-level hallucinations by 90.0% and 95.3%, respectively, on Object HalBench—outperforming GPT-4V.

Technology Category

Application Category

📝 Abstract
Direct Preference Optimization (DPO) has shown effectiveness in aligning multi-modal large language models (MLLM) with human preferences. However, existing methods exhibit an imbalanced responsiveness to the data of varying hardness, tending to overfit on the easy-to-distinguish data while underfitting on the hard-to-distinguish data. In this paper, we propose Data- and Model-aware DPO (DAMO) to dynamically adjust the optimization process from two key aspects: (1) a data-aware strategy that incorporates data hardness, and (2) a model-aware strategy that integrates real-time model responses. By combining the two strategies, DAMO enables the model to effectively adapt to data with varying levels of hardness. Extensive experiments on five benchmarks demonstrate that DAMO not only significantly enhances the trustworthiness, but also improves the effectiveness over general tasks. For instance, on the Object HalBench, our DAMO-7B reduces response-level and mentioned-level hallucination by 90.0% and 95.3%, respectively, surpassing the performance of GPT-4V.
Problem

Research questions and friction points this paper is trying to address.

Align multi-modal LLMs with human preferences
Address imbalance in data responsiveness
Enhance model adaptation to varying data hardness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic adjustment of optimization process
Data-aware strategy incorporating hardness
Model-aware strategy integrating responses
🔎 Similar Papers
No similar papers found.