One Model for All: Multi-Objective Controllable Language Models

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing language model alignment methods struggle to achieve personalized and controllable generation under multi-objective trade-offs—such as empathy, efficiency, and accuracy. This work proposes a Multi-Objective Control (MOC) framework that, for the first time, integrates multi-objective optimization into the Reinforcement Learning from Human Feedback (RLHF) pipeline. MOC trains a single policy network capable of generating responses conditioned on user-specified preferences, effectively covering diverse requirements along the Pareto frontier. By combining preference-conditioned policies, Pareto modeling, and efficient policy-level multi-objective optimization, the method achieves superior performance using only a single A6000 GPU on a 7B-parameter model. It significantly outperforms existing baselines in controllability, output diversity (measured by hypervolume), and generalization to unseen preference combinations.

Technology Category

Application Category

📝 Abstract

Aligning large language models (LLMs) with human preferences is critical for enhancing LLMs' safety, helpfulness, humor, faithfulness, etc. Current reinforcement learning from human feedback (RLHF) mainly focuses on a fixed reward learned from average human ratings, which may weaken the adaptability and controllability of varying preferences. However, creating personalized LLMs requires aligning LLMs with individual human preferences, which is non-trivial due to the scarce data per user and the diversity of user preferences in multi-objective trade-offs, varying from emphasizing empathy in certain contexts to demanding efficiency and precision in others. Can we train one LLM to produce personalized outputs across different user preferences on the Pareto front? In this paper, we introduce Multi-Objective Control (MOC), which trains a single LLM to directly generate responses in the preference-defined regions of the Pareto front. Our approach introduces multi-objective optimization (MOO) principles into RLHF to train an LLM as a preference-conditioned policy network. We improve the computational efficiency of MOC by applying MOO at the policy level, enabling us to fine-tune a 7B-parameter model on a single A6000 GPU. Extensive experiments demonstrate the advantages of MOC over baselines in three aspects: (i) controllability of LLM outputs w.r.t. user preferences on the trade-off among multiple rewards; (ii) quality and diversity of LLM outputs, measured by the hyper-volume of multiple solutions achieved; and (iii) generalization to unseen preferences. These results highlight MOC's potential for real-world applications requiring scalable and customizable LLMs.

Problem

Research questions and friction points this paper is trying to address.

personalized LLMs

multi-objective trade-offs

user preferences

Pareto front

controllable language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Objective Control

Preference-conditioned Policy

Pareto Front