MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Large language models (LLMs) face inherent trade-offs among multiple alignment objectives—e.g., helpfulness, harmlessness, and humor—and suffer from high computational costs when fine-tuning separate specialized models for each objective. Method: This paper proposes a lightweight, inference-time multi-objective alignment framework that requires no weight updates. It enables user-controllable objective weighting and dynamically steers generation via a set of small, trained value models. Key technical components include KL regularization, iterative optimization, and a tilt function to jointly achieve weighted objective fusion and output distribution calibration. Contribution/Results: Compared to sequential single-objective fine-tuning baselines, our method drastically reduces computational overhead while supporting flexible, real-time preference balancing. Empirically, it achieves performance competitive with idealized dedicated fine-tuned models—demonstrating strong Pareto efficiency across objectives without sacrificing generality or requiring architectural modification.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly deployed across diverse applications that demand balancing multiple, often conflicting, objectives -- such as helpfulness, harmlessness, or humor. Aligning outputs to user-specific preferences in such multi-objective settings typically requires fine-tuning models for each objective or preference configuration, which is computationally expensive and inflexible. We introduce MAVIS -- Multi-Objective Alignment via Value-Guided Inference-Time Search -- a lightweight inference-time alignment framework that enables dynamic control over LLM behavior without modifying the base model's weights. MAVIS trains a set of small value models, each corresponding to a distinct objective. At inference time, these value models are combined using user-specified weights to produce a tilting function that adjusts the base model's output distribution toward desired trade-offs. The value models are trained using a simple iterative algorithm that ensures monotonic improvement of the KL-regularized policy. We show empirically that MAVIS outperforms baselines that fine-tune per-objective models and combine them post hoc, and even approaches the performance of the idealized setting where models are fine-tuned for a user's exact preferences.

Problem

Research questions and friction points this paper is trying to address.

Balancing conflicting objectives in LLM outputs

Avoiding expensive fine-tuning for each preference

Enabling dynamic control of LLM behavior

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inference-time search without weight modification

Small value models trained for distinct objectives

Tilting function adjusts output for trade-offs

🔎 Similar Papers

Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment