Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-preference alignment methods for large language models often converge to suboptimal local equilibria and struggle to approximate the global Pareto frontier. This work proposes the Pareto Lenient Consensus (PLC) framework, which introduces, for the first time, a dynamic negotiation mechanism from game theory into multi-objective preference alignment. By employing consensus-driven lenient gradient corrections, PLC tolerates temporary local performance degradation when surplus exists among dominant coalitions, thereby escaping local equilibria. The approach overcomes the limitations of conventional static scalarization and rigid gradient projection, significantly outperforming existing baselines in both fixed-preference alignment tasks and Pareto frontier quality, while also providing theoretical convergence guarantees.
📝 Abstract
Transcending the single-preference paradigm, aligning LLMs with diverse human values is pivotal for robust deployment. Contemporary Multi-Objective Preference Alignment (MPA) approaches predominantly rely on static linear scalarization or rigid gradient projection to navigate these trade-offs. However, by enforcing strict conflict avoidance or simultaneous descent, these paradigms often prematurely converge to local stationary points. While mathematically stable, these points represent a conservative compromise where the model sacrifices potential global Pareto improvements to avoid transient local trade-offs. To break this deadlock, we propose Pareto-Lenient Consensus (PLC), a game-theoretic framework that reimagines alignment as a dynamic negotiation process. Unlike rigid approaches, PLC introduces consensus-driven lenient gradient rectification, which dynamically tolerates local degradation provided there is a sufficient dominant coalition surplus, thereby empowering the optimization trajectory to escape local suboptimal equilibrium and explore the distal Pareto-optimal frontier. Theoretical analysis validates PLC can facilitate stalemate escape and asymptotically converge to a Pareto consensus equilibrium. Moreover, extensive experiments show that PLC surpasses baselines in both fixed-preference alignment and global Pareto frontier quality. This work highlights the potential of negotiation-driven alignment as a promising avenue for MPA. Our codes are available at https://anonymous.4open.science/r/aaa-6BB8.
Problem

Research questions and friction points this paper is trying to address.

Multi-Preference Alignment
Pareto Optimality
LLM Alignment
Preference Trade-offs
Multi-Objective Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pareto-Lenient Consensus
Multi-Objective Preference Alignment
Gradient Rectification
Game-Theoretic Framework
Pareto Frontier
🔎 Similar Papers
No similar papers found.