SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat

πŸ“… 2025-06-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the limitations of single large language models (LLMs) in response diversity and evaluation bias, this paper proposes β€œSpartan Tribe,” a multi-model collaborative alignment framework. It employs instruction-driven pairwise adversarial generation, cross-model mutual evaluation, and Elo-based reputation-weighted aggregation to dynamically construct high-quality preference data and jointly optimize model behavior. The core contribution is the first introduction of a *dynamic competitive alignment paradigm*, integrating reputation-aware mutual evaluation and collective preference learning to enable knowledge complementarity and bias mitigation. Technically, the framework unifies an adaptive Elo reputation system, multi-model cross-evaluation, Direct Preference Optimization (DPO), and iterative collective reinforcement learning. Evaluated across 12 diverse tasks, Spartan Tribe outperforms baselines on 10 tasks, achieving an average improvement of 7.0%, with significant gains in generalization, logical coherence, expressive directness, and information density.

Technology Category

Application Category

πŸ“ Abstract
We propose SPARTA ALIGNMENT, an algorithm to collectively align multiple LLMs through competition and combat. To complement a single model's lack of diversity in generation and biases in evaluation, multiple LLMs form a"sparta tribe"to compete against each other in fulfilling instructions while serving as judges for the competition of others. For each iteration, one instruction and two models are selected for a duel, the other models evaluate the two responses, and their evaluation scores are aggregated through a adapted elo-ranking based reputation system, where winners/losers of combat gain/lose weight in evaluating others. The peer-evaluated combat results then become preference pairs where the winning response is preferred over the losing one, and all models learn from these preferences at the end of each iteration. SPARTA ALIGNMENT enables the self-evolution of multiple LLMs in an iterative and collective competition process. Extensive experiments demonstrate that SPARTA ALIGNMENT outperforms initial models and 4 self-alignment baselines across 10 out of 12 tasks and datasets with 7.0% average improvement. Further analysis reveals that SPARTA ALIGNMENT generalizes more effectively to unseen tasks and leverages the expertise diversity of participating models to produce more logical, direct and informative outputs.
Problem

Research questions and friction points this paper is trying to address.

Aligning multiple LLMs through competitive combat and peer evaluation
Addressing single model biases and lack of generation diversity
Improving model performance via collective self-evolution and preference learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple LLMs compete in peer-evaluated duels
Adapted Elo-ranking aggregates evaluation scores
Iterative combat process enhances model alignment
πŸ”Ž Similar Papers
2024-06-20International Conference on Computational LinguisticsCitations: 2