$φ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the challenges of fairness degradation and catastrophic forgetting in large language multimodal models under continual learning scenarios, where imbalanced data distributions exacerbate both issues. To mitigate these problems, the authors propose a novel paradigm based on Direct Preference Optimization (DPO), introducing a φ-DPO loss function that explicitly models and alleviates distributional bias. By leveraging pairwise preference signals to align the learning process, the method simultaneously suppresses catastrophic forgetting and enhances fairness. Notably, this study is the first to integrate fairness constraints into a multimodal continual learning framework and introduces the first benchmark with pairwise preference annotations tailored for this setting. Experimental results demonstrate that the proposed approach significantly outperforms existing methods across multiple benchmarks, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

Fairness in Continual Learning for Large Multimodal Models (LMMs) is an emerging yet underexplored challenge, particularly in the presence of imbalanced data distributions that can lead to biased model updates and suboptimal performance across tasks. While recent continual learning studies have made progress in addressing catastrophic forgetting, the problem of fairness caused the imbalanced data remains largely underexplored. This paper presents a novel Fairness Direct Preference Optimization (FaiDPO or $φ$-DPO) framework for continual learning in LMMs. In particular, we first propose a new continual learning paradigm based on Direct Preference Optimization (DPO) to mitigate catastrophic forgetting by aligning learning with pairwise preference signals. Then, we identify the limitations of conventional DPO in imbalanced data and present a new $φ$-DPO loss that explicitly addresses distributional biases. We provide a comprehensive theoretical analysis demonstrating that our approach addresses both forgetting and data imbalance. Additionally, to enable $φ$-DPO-based continual learning, we construct pairwise preference annotations for existing benchmarks in the context of continual learning. Extensive experiments and ablation studies show the proposed $φ$-DPO achieves State-of-the-Art performance across multiple benchmarks, outperforming prior continual learning methods of LMMs.

Problem

Research questions and friction points this paper is trying to address.

Fairness

Continual Learning

Large Multimodal Models

Imbalanced Data

Catastrophic Forgetting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fairness

Direct Preference Optimization

Continual Learning