Aligning Visual Contrastive learning models via Preference Optimization

📅 2024-11-12

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses critical limitations in vision-language contrastive learning—vulnerability to typographic attacks, gender bias, and misalignment with human preferences. We propose the first integration of preference optimization (PO) into vision-language contrastive learning, jointly incorporating adversarial robustness training and sensitive-attribute disentanglement via intervention. Our method enables fine-grained semantic disentanglement and controllable alignment with sensitive attributes (e.g., gender). Experiments demonstrate substantial improvements over standard contrastive baselines across multitask evaluations: enhanced robustness against typographic attacks, a 37.2% reduction in Bias Score (indicating significantly mitigated gender bias), and maintained downstream task accuracy. The core contribution is the novel application of PO to vision-language contrastive learning, unifying improvements in model robustness, fairness, and generalization—thereby advancing the state of the art in aligned, reliable, and equitable multimodal representation learning.

Technology Category

Application Category

📝 Abstract

Contrastive learning models have demonstrated impressive abilities to capture semantic similarities by aligning representations in the embedding space. However, their performance can be limited by the quality of the training data and its inherent biases. While Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) have been applied to generative models to align them with human preferences, their use in contrastive learning has yet to be explored. This paper introduces a novel method for training contrastive learning models using Preference Optimization (PO) to break down complex concepts. Our method systematically aligns model behavior with desired preferences, enhancing performance on the targeted task. In particular, we focus on enhancing model robustness against typographic attacks, commonly seen in contrastive models like CLIP. We further apply our method to disentangle gender understanding and mitigate gender biases, offering a more nuanced control over these sensitive attributes. Our experiments demonstrate that models trained using PO outperform standard contrastive learning techniques while retaining their ability to handle adversarial challenges and maintain accuracy on other downstream tasks. This makes our method well-suited for tasks requiring fairness, robustness, and alignment with specific preferences. We evaluate our method on several vision-language tasks, tackling challenges such as typographic attacks. Additionally, we explore the model's ability to disentangle gender concepts and mitigate gender bias, showcasing the versatility of our approach.

Problem

Research questions and friction points this paper is trying to address.

Model Adjustment

Bias Reduction

Performance Enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Preference Optimization

Bias Reduction

Performance Enhancement

🔎 Similar Papers

No similar papers found.