Improving Model Alignment Through Collective Intelligence of Open-Source LLMS

📅 2025-05-05

📈 Citations: 0

✨ Influential: 0

career value

260K/year

🤖 AI Summary

To address the high cost, poor scalability, and limited diversity and generalization of human-annotated alignment data, this paper proposes the Mixture-of-Agents Alignment (MoAA) framework. MoAA leverages multiple open-source large language models (LLMs) in concert to generate high-quality, diverse, and scalable synthetic alignment data—without requiring supervision from stronger external models—thereby establishing a self-enhancing training loop. By integrating a mixture-of-agents architecture with multi-model consensus sampling, MoAA overcomes the diversity bottleneck inherent in single-model synthetic data generation. Evaluated on LLaMA-3.1-8B-Instruct, MoAA improves Arena-Hard win rate from 19.5% to 48.3% and AlpacaEval 2 score from 22.33 to 57.23. It significantly enhances both supervised fine-tuning (SFT) and direct preference optimization (DPO). To our knowledge, this is the first work to enable efficient, low-cost, and highly generalizable autonomous construction of alignment data via multi-model collaboration.

Technology Category

Application Category

📝 Abstract

Building helpful and harmless large language models (LLMs) requires effective model alignment approach based on human instructions and feedback, which necessitates high-quality human-labeled data. Constructing such datasets is often expensive and hard to scale, and may face potential limitations on diversity and generalization. To address these challenges, we introduce Mixture of Agents Alignment (MoAA), that leverages the collective strengths of various language models to provide high-quality data for model alignment. By employing MoAA, we enhance both supervised fine-tuning and preference optimization, leading to improved performance compared to using a single model alone to generate alignment data (e.g. using GPT-4o alone). Evaluation results show that our approach can improve win rate of LLaMA-3.1-8B-Instruct from 19.5 to 48.3 on Arena-Hard and from 22.33 to 57.23 on AlpacaEval2, highlighting a promising direction for model alignment through this new scalable and diverse synthetic data recipe. Furthermore, we demonstrate that MoAA enables a self-improvement pipeline, where models finetuned on MoA-generated data surpass their own initial capabilities, providing evidence that our approach can push the frontier of open-source LLMs without reliance on stronger external supervision. Data and code will be released.

Problem

Research questions and friction points this paper is trying to address.

High-cost human-labeled data limits LLM alignment scalability.

Single-model alignment data lacks diversity and generalization.

MoAA leverages collective models for better alignment data.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages collective strengths of various language models

Enhances supervised fine-tuning and preference optimization

Enables self-improvement pipeline for model capabilities

🔎 Similar Papers

Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning