Collab: Controlled Decoding using Mixture of Agents for LLM Alignment

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This paper addresses the challenge of efficient inference-time alignment for large language models (LLMs). We propose a lightweight, fine-tuning-free alignment framework that enables real-time adaptation across multiple tasks. Our core innovation is a novel multi-agent collaborative decoding mechanism: for each token, the optimal LLM is dynamically selected based on long-horizon utility estimation; this integrates hybrid agent control, token-level policy switching, offline alignment model ensembling, and target-reward-driven utility modeling. Theoretically guaranteed to converge toward optimal alignment, our method significantly improves empirical performance: average reward increases by 1.56× in multi-task preference alignment, and GPT-4 win rate rises by 71.89%. This establishes a new paradigm for low-overhead, high-robustness inference-time alignment.

Technology Category

Application Category

📝 Abstract

Alignment of Large Language models (LLMs) is crucial for safe and trustworthy deployment in applications. Reinforcement learning from human feedback (RLHF) has emerged as an effective technique to align LLMs to human preferences and broader utilities, but it requires updating billions of model parameters, which is computationally expensive. Controlled Decoding, by contrast, provides a mechanism for aligning a model at inference time without retraining. However, single-agent decoding approaches often struggle to adapt to diverse tasks due to the complexity and variability inherent in these tasks. To strengthen the test-time performance w.r.t the target task, we propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies. Treating each prior policy as an agent in the spirit of mixture of agent collaboration, we develop a decoding method that allows for inference-time alignment through a token-level selection strategy among multiple agents. For each token, the most suitable LLM is dynamically chosen from a pool of models based on a long-term utility metric. This policy-switching mechanism ensures optimal model selection at each step, enabling efficient collaboration and alignment among LLMs during decoding. Theoretical analysis of our proposed algorithm establishes optimal performance with respect to the target task represented via a target reward for the given off-the-shelf models. We conduct comprehensive empirical evaluations with open-source aligned models on diverse tasks and preferences, which demonstrates the merits of this approach over single-agent decoding baselines. Notably, Collab surpasses the current SoTA decoding strategy, achieving an improvement of up to 1.56x in average reward and 71.89% in GPT-4 based win-tie rate.

Problem

Research questions and friction points this paper is trying to address.

Aligning LLMs without retraining using controlled decoding

Improving adaptability to diverse tasks with multi-agent strategies

Enhancing decoding performance via dynamic model selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of agent-based decoding strategies

Dynamic token-level selection among agents

Policy-switching for optimal model selection

🔎 Similar Papers

No similar papers found.