Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In multi-objective alignment (MOA), conflicting human preferences impede convergence to the Pareto frontier, causing gradient direction inconsistencies in DPO-based methods. To address this, we propose a self-improving DPO framework—the first to integrate autonomous generation and Pareto-optimal response selection directly into the DPO pipeline. Specifically, an LLM performs self-reflection to generate diverse candidate responses; these are then filtered via multi-objective preference modeling and Pareto dominance testing, yielding high-quality self-supervised preference pairs. By bypassing explicit preference conflict resolution, our method enables end-to-end optimization toward the Pareto frontier. Evaluated on two standard MOA benchmarks, it achieves significant improvements in Pareto coverage and hypervolume, consistently outperforming existing MOA approaches.

Technology Category

Application Category

📝 Abstract

Multi-Objective Alignment (MOA) aims to align LLMs' responses with multiple human preference objectives, with Direct Preference Optimization (DPO) emerging as a prominent approach. However, we find that DPO-based MOA approaches suffer from widespread preference conflicts in the data, where different objectives favor different responses. This results in conflicting optimization directions, hindering the optimization on the Pareto Front. To address this, we propose to construct Pareto-optimal responses to resolve preference conflicts. To efficiently obtain and utilize such responses, we propose a self-improving DPO framework that enables LLMs to self-generate and select Pareto-optimal responses for self-supervised preference alignment. Extensive experiments on two datasets demonstrate the superior Pareto Front achieved by our framework compared to various baselines. Code is available at url{https://github.com/zyttt-coder/SIPO}.

Problem

Research questions and friction points this paper is trying to address.

Resolves preference conflicts in Multi-Objective Alignment

Enables self-generation of Pareto-optimal responses

Improves Pareto Front optimization in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-improving DPO framework

Pareto-optimal response generation

Self-supervised preference alignment

🔎 Similar Papers

No similar papers found.

Authors to Follow