SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

To address the limitations of single large language models (LLMs) in response diversity and evaluation bias, this paper proposes “Spartan Tribe,” a multi-model collaborative alignment framework. It employs instruction-driven pairwise adversarial generation, cross-model mutual evaluation, and Elo-based reputation-weighted aggregation to dynamically construct high-quality preference data and jointly optimize model behavior. The core contribution is the first introduction of a *dynamic competitive alignment paradigm*, integrating reputation-aware mutual evaluation and collective preference learning to enable knowledge complementarity and bias mitigation. Technically, the framework unifies an adaptive Elo reputation system, multi-model cross-evaluation, Direct Preference Optimization (DPO), and iterative collective reinforcement learning. Evaluated across 12 diverse tasks, Spartan Tribe outperforms baselines on 10 tasks, achieving an average improvement of 7.0%, with significant gains in generalization, logical coherence, expressive directness, and information density.

Technology Category

Application Category

📝 Abstract

We propose SPARTA ALIGNMENT, an algorithm to collectively align multiple LLMs through competition and combat. To complement a single model's lack of diversity in generation and biases in evaluation, multiple LLMs form a"sparta tribe"to compete against each other in fulfilling instructions while serving as judges for the competition of others. For each iteration, one instruction and two models are selected for a duel, the other models evaluate the two responses, and their evaluation scores are aggregated through a adapted elo-ranking based reputation system, where winners/losers of combat gain/lose weight in evaluating others. The peer-evaluated combat results then become preference pairs where the winning response is preferred over the losing one, and all models learn from these preferences at the end of each iteration. SPARTA ALIGNMENT enables the self-evolution of multiple LLMs in an iterative and collective competition process. Extensive experiments demonstrate that SPARTA ALIGNMENT outperforms initial models and 4 self-alignment baselines across 10 out of 12 tasks and datasets with 7.0% average improvement. Further analysis reveals that SPARTA ALIGNMENT generalizes more effectively to unseen tasks and leverages the expertise diversity of participating models to produce more logical, direct and informative outputs.

Problem

Research questions and friction points this paper is trying to address.

Aligning multiple LLMs through competitive combat and peer evaluation

Addressing single model biases and lack of generation diversity

Improving model performance via collective self-evolution and preference learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple LLMs compete in peer-evaluated duels

Adapted Elo-ranking aggregates evaluation scores

Iterative combat process enhances model alignment

🔎 Similar Papers

Selected Languages are All You Need for Cross-lingual Truthfulness Transfer