🤖 AI Summary
This work addresses the challenges of high computational cost and limited generalization faced by large language models when aligning with diverse or even conflicting human preferences. To overcome these limitations, the authors propose the Inference-Aware Meta-Alignment (IAMA) framework, which employs meta-optimization during training to endow the base model with the ability to efficiently adapt to multiple alignment criteria at inference time. IAMA is the first approach to enable low-cost, multi-criterion alignment during inference and introduces a provably convergent nonlinear GRPO algorithm that optimizes model adaptability within the space of probability measures. Experimental results demonstrate that IAMA substantially reduces the computational overhead of inference-time alignment while effectively preserving alignment performance across multiple conflicting preferences.
📝 Abstract
Aligning large language models (LLMs) to diverse human preferences is fundamentally challenging since criteria can often conflict with each other. Inference-time alignment methods have recently gained popularity as they allow LLMs to be aligned to multiple criteria via different alignment algorithms at inference time. However, inference-time alignment is computationally expensive since it often requires multiple forward passes of the base model. In this work, we propose inference-aware meta-alignment (IAMA), a novel approach that enables LLMs to be aligned to multiple criteria with limited computational budget at inference time. IAMA trains a base model such that it can be effectively aligned to multiple tasks via different inference-time alignment algorithms. To solve the non-linear optimization problems involved in IAMA, we propose non-linear GRPO, which provably converges to the optimal solution in the space of probability measures.