๐ค AI Summary
In multi-objective alignment (MOA), conflicting human preferences impede convergence to the Pareto frontier, causing gradient direction inconsistencies in DPO-based methods. To address this, we propose a self-improving DPO frameworkโthe first to integrate autonomous generation and Pareto-optimal response selection directly into the DPO pipeline. Specifically, an LLM performs self-reflection to generate diverse candidate responses; these are then filtered via multi-objective preference modeling and Pareto dominance testing, yielding high-quality self-supervised preference pairs. By bypassing explicit preference conflict resolution, our method enables end-to-end optimization toward the Pareto frontier. Evaluated on two standard MOA benchmarks, it achieves significant improvements in Pareto coverage and hypervolume, consistently outperforming existing MOA approaches.
๐ Abstract
Multi-Objective Alignment (MOA) aims to align LLMs' responses with multiple human preference objectives, with Direct Preference Optimization (DPO) emerging as a prominent approach. However, we find that DPO-based MOA approaches suffer from widespread preference conflicts in the data, where different objectives favor different responses. This results in conflicting optimization directions, hindering the optimization on the Pareto Front. To address this, we propose to construct Pareto-optimal responses to resolve preference conflicts. To efficiently obtain and utilize such responses, we propose a self-improving DPO framework that enables LLMs to self-generate and select Pareto-optimal responses for self-supervised preference alignment. Extensive experiments on two datasets demonstrate the superior Pareto Front achieved by our framework compared to various baselines. Code is available at url{https://github.com/zyttt-coder/SIPO}.