ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

To address performance bottlenecks in offline multi-objective alignment of large language models—specifically, inadequate preference representation and imbalanced reward scoring—this paper proposes a novel data filtering paradigm grounded in preference direction modeling and Pareto frontier guidance. Methodologically: (1) human preferences are explicitly encoded as unit direction vectors in the objective space; (2) a two-stage mechanism is introduced—first identifying the neighborhood of the Pareto frontier, then dynamically sampling high-quality samples aligned with the target direction; (3) an end-to-end offline multi-objective alignment framework is developed, enabling customizable alignment behavior. Experiments demonstrate that our method significantly outperforms five baselines across two multi-objective alignment tasks, achieving simultaneous improvements in alignment quality, training efficiency, and objective diversity. To the best of our knowledge, this is the first approach to realize direction-controllable, data-adaptive offline multi-objective alignment.

Technology Category

Application Category

📝 Abstract

Aligning large language models with multiple human expectations and values is crucial for ensuring that they adequately serve a variety of user needs. To this end, offline multiobjective alignment algorithms such as the Rewards-in-Context algorithm have shown strong performance and efficiency. However, inappropriate preference representations and training with imbalanced reward scores limit the performance of such algorithms. In this work, we introduce ParetoHqD that addresses the above issues by representing human preferences as preference directions in the objective space and regarding data near the Pareto front as ''high-quality'' data. For each preference, ParetoHqD follows a two-stage supervised fine-tuning process, where each stage uses an individual Pareto high-quality training set that best matches its preference direction. The experimental results have demonstrated the superiority of ParetoHqD over five baselines on two multiobjective alignment tasks.

Problem

Research questions and friction points this paper is trying to address.

Aligning large language models with multiple human expectations and values

Addressing inappropriate preference representations in multiobjective alignment

Improving performance with imbalanced reward scores in training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Represents preferences as objective space directions

Uses Pareto front data as high-quality training sets

Two-stage fine-tuning with preference-matched Pareto data

🔎 Similar Papers

Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment