MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment

📅 2026-04-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
This work addresses the inherent conflicts among multiple alignment objectives—such as helpfulness, truthfulness, and harmlessness—in large language models, where conventional fixed scalarization methods often lead to systematic neglect of certain goals. The paper introduces, for the first time, a geometry-aware multi-objective optimization approach into the Direct Preference Optimization (DPO) framework, proposing a decoupled optimization method based on Multiple Gradient Descent Algorithm (MGDA). By dynamically identifying a shared descent direction across objectives, the method achieves a fair trade-off without requiring reinforcement learning or explicit reward models. Experiments on the UltraFeedback dataset demonstrate that the proposed approach attains state-of-the-art performance, achieving the highest win rates against golden responses both overall and on individual evaluation criteria.

Technology Category

Application Category

📝 Abstract
Aligning large language models (LLMs) to desirable human values requires balancing multiple, potentially conflicting objectives such as helpfulness, truthfulness, and harmlessness, which presents a multi-objective optimisation challenge. Most alignment pipelines rely on a fixed scalarisation of these objectives, which can introduce procedural unfairness by systematically under-weighting harder-to-optimise or minority objectives. To promote more equitable trade-offs, we introduce MGDA-Decoupled, a geometry-based multi-objective optimisation algorithm that finds a shared descent direction while explicitly accounting for each objective's convergence dynamics. In contrast to prior methods that depend on reinforcement learning (e.g., GAPO) or explicit reward models (e.g., MODPO), our approach operates entirely within the lightweight Direct Preference Optimisation (DPO) paradigm. Experiments on the UltraFeedback dataset show that geometry-aware methods -- and MGDA-Decoupled in particular -- achieve the highest win rates against golden responses, both overall and per objective.
Problem

Research questions and friction points this paper is trying to address.

multi-objective optimisation
LLM alignment
helpfulness
truthfulness
harmlessness
Innovation

Methods, ideas, or system contributions that make the work stand out.

MGDA-Decoupled
geometry-aware optimization
multi-objective alignment
Direct Preference Optimisation (DPO)
LLM alignment
🔎 Similar Papers
No similar papers found.