ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance bottlenecks in offline multi-objective alignment of large language models—specifically, inadequate preference representation and imbalanced reward scoring—this paper proposes a novel data filtering paradigm grounded in preference direction modeling and Pareto frontier guidance. Methodologically: (1) human preferences are explicitly encoded as unit direction vectors in the objective space; (2) a two-stage mechanism is introduced—first identifying the neighborhood of the Pareto frontier, then dynamically sampling high-quality samples aligned with the target direction; (3) an end-to-end offline multi-objective alignment framework is developed, enabling customizable alignment behavior. Experiments demonstrate that our method significantly outperforms five baselines across two multi-objective alignment tasks, achieving simultaneous improvements in alignment quality, training efficiency, and objective diversity. To the best of our knowledge, this is the first approach to realize direction-controllable, data-adaptive offline multi-objective alignment.

Technology Category

Application Category

📝 Abstract
Aligning large language models with multiple human expectations and values is crucial for ensuring that they adequately serve a variety of user needs. To this end, offline multiobjective alignment algorithms such as the Rewards-in-Context algorithm have shown strong performance and efficiency. However, inappropriate preference representations and training with imbalanced reward scores limit the performance of such algorithms. In this work, we introduce ParetoHqD that addresses the above issues by representing human preferences as preference directions in the objective space and regarding data near the Pareto front as ''high-quality'' data. For each preference, ParetoHqD follows a two-stage supervised fine-tuning process, where each stage uses an individual Pareto high-quality training set that best matches its preference direction. The experimental results have demonstrated the superiority of ParetoHqD over five baselines on two multiobjective alignment tasks.
Problem

Research questions and friction points this paper is trying to address.

Aligning large language models with multiple human expectations and values
Addressing inappropriate preference representations in multiobjective alignment
Improving performance with imbalanced reward scores in training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Represents preferences as objective space directions
Uses Pareto front data as high-quality training sets
Two-stage fine-tuning with preference-matched Pareto data
🔎 Similar Papers
No similar papers found.
Haoran Gu
Haoran Gu
School of Artificial Intelligence, Xidian University
Data-Driven OptimizationTrustworthy AILarge Language ModelMachine Learning
Handing Wang
Handing Wang
School of Artificial Intelligence, Xidian University
Evolutionary ComputingMulti-objective OptimizationData-Driven OptimizationTrustworthy AI
Y
Yi Mei
School of Engineering and Computer Science, Victoria University of Wellington, New Zealand
M
Mengjie Zhang
School of Engineering and Computer Science, Victoria University of Wellington, New Zealand
Y
Yaochu Jin
School of Engineering, Westlake University, China