Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Language models often produce inconsistent and contradictory responses during reasoning due to unreliable path selection, and existing inference-time methods fail to fundamentally resolve this inconsistency. Method: We propose a multi-agent debate framework that internalizes self-consistency as a learnable model property: multiple agents concurrently generate reasoning paths, and a consensus alignment mechanism—based on majority or minority voting—guides reinforcement learning without external supervision. Our approach builds upon the MACA RL framework, integrating multi-agent negotiation, peer argument dependency, and sampling-based path selection. Results: Experiments demonstrate substantial improvements: +27.6% self-consistency on GSM8K; +23.7% single-agent accuracy on MATH; +42.7% multi-agent decision accuracy on MathQA; and strong generalization to unseen benchmarks such as GPQA.

Technology Category

Application Category

📝 Abstract

Language Models (LMs) are inconsistent reasoners, often generating contradictory responses to identical prompts. While inference-time methods can mitigate these inconsistencies, they fail to address the core problem: LMs struggle to reliably select reasoning pathways leading to consistent outcomes under exploratory sampling. To address this, we formalize self-consistency as an intrinsic property of well-aligned reasoning models and introduce Multi-Agent Consensus Alignment (MACA), a reinforcement learning framework that post-trains models to favor reasoning trajectories aligned with their internal consensus using majority/minority outcomes from multi-agent debate. These trajectories emerge from deliberative exchanges where agents ground reasoning in peer arguments, not just aggregation of independent attempts, creating richer consensus signals than single-round majority voting. MACA enables agents to teach themselves to be more decisive and concise, and better leverage peer insights in multi-agent settings without external supervision, driving substantial improvements across self-consistency (+27.6% on GSM8K), single-agent reasoning (+23.7% on MATH), sampling-based inference (+22.4% Pass@20 on MATH), and multi-agent ensemble decision-making (+42.7% on MathQA). These findings, coupled with strong generalization to unseen benchmarks (+16.3% on GPQA, +11.6% on CommonsenseQA), demonstrate robust self-alignment that more reliably unlocks latent reasoning potential of language models.

Problem

Research questions and friction points this paper is trying to address.

Addressing language models' inconsistency in generating contradictory responses

Improving reliability of reasoning pathway selection for consistent outcomes

Enhancing self-consistency through multi-agent consensus alignment framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent reinforcement learning framework for self-consistency

Post-trains models using majority/minority debate outcomes

Leverages deliberative exchanges for richer consensus signals

🔎 Similar Papers

Are language models rational? The case of coherence norms and belief revision