MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the inherent conflicts among multiple alignment objectives—such as helpfulness, truthfulness, and harmlessness—in large language models, where conventional fixed scalarization methods often lead to systematic neglect of certain goals. The paper introduces, for the first time, a geometry-aware multi-objective optimization approach into the Direct Preference Optimization (DPO) framework, proposing a decoupled optimization method based on Multiple Gradient Descent Algorithm (MGDA). By dynamically identifying a shared descent direction across objectives, the method achieves a fair trade-off without requiring reinforcement learning or explicit reward models. Experiments on the UltraFeedback dataset demonstrate that the proposed approach attains state-of-the-art performance, achieving the highest win rates against golden responses both overall and on individual evaluation criteria.

Technology Category

Application Category

📝 Abstract

Aligning large language models (LLMs) to desirable human values requires balancing multiple, potentially conflicting objectives such as helpfulness, truthfulness, and harmlessness, which presents a multi-objective optimisation challenge. Most alignment pipelines rely on a fixed scalarisation of these objectives, which can introduce procedural unfairness by systematically under-weighting harder-to-optimise or minority objectives. To promote more equitable trade-offs, we introduce MGDA-Decoupled, a geometry-based multi-objective optimisation algorithm that finds a shared descent direction while explicitly accounting for each objective's convergence dynamics. In contrast to prior methods that depend on reinforcement learning (e.g., GAPO) or explicit reward models (e.g., MODPO), our approach operates entirely within the lightweight Direct Preference Optimisation (DPO) paradigm. Experiments on the UltraFeedback dataset show that geometry-aware methods -- and MGDA-Decoupled in particular -- achieve the highest win rates against golden responses, both overall and per objective.

Problem

Research questions and friction points this paper is trying to address.

multi-objective optimisation

LLM alignment

helpfulness

truthfulness

harmlessness

Innovation

Methods, ideas, or system contributions that make the work stand out.

MGDA-Decoupled

geometry-aware optimization

multi-objective alignment