Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing multi-preference alignment methods for large language models often converge to suboptimal local equilibria and struggle to approximate the global Pareto frontier. This work proposes the Pareto Lenient Consensus (PLC) framework, which introduces, for the first time, a dynamic negotiation mechanism from game theory into multi-objective preference alignment. By employing consensus-driven lenient gradient corrections, PLC tolerates temporary local performance degradation when surplus exists among dominant coalitions, thereby escaping local equilibria. The approach overcomes the limitations of conventional static scalarization and rigid gradient projection, significantly outperforming existing baselines in both fixed-preference alignment tasks and Pareto frontier quality, while also providing theoretical convergence guarantees.

Technology Category

Application Category

📝 Abstract

Transcending the single-preference paradigm, aligning LLMs with diverse human values is pivotal for robust deployment. Contemporary Multi-Objective Preference Alignment (MPA) approaches predominantly rely on static linear scalarization or rigid gradient projection to navigate these trade-offs. However, by enforcing strict conflict avoidance or simultaneous descent, these paradigms often prematurely converge to local stationary points. While mathematically stable, these points represent a conservative compromise where the model sacrifices potential global Pareto improvements to avoid transient local trade-offs. To break this deadlock, we propose Pareto-Lenient Consensus (PLC), a game-theoretic framework that reimagines alignment as a dynamic negotiation process. Unlike rigid approaches, PLC introduces consensus-driven lenient gradient rectification, which dynamically tolerates local degradation provided there is a sufficient dominant coalition surplus, thereby empowering the optimization trajectory to escape local suboptimal equilibrium and explore the distal Pareto-optimal frontier. Theoretical analysis validates PLC can facilitate stalemate escape and asymptotically converge to a Pareto consensus equilibrium. Moreover, extensive experiments show that PLC surpasses baselines in both fixed-preference alignment and global Pareto frontier quality. This work highlights the potential of negotiation-driven alignment as a promising avenue for MPA. Our codes are available at https://anonymous.4open.science/r/aaa-6BB8.

Problem

Research questions and friction points this paper is trying to address.

Multi-Preference Alignment

Pareto Optimality

LLM Alignment

Preference Trade-offs

Multi-Objective Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pareto-Lenient Consensus

Multi-Objective Preference Alignment

Gradient Rectification