AI Debate Aids Assessment of Controversial Claims

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This study addresses the risk that AI systems—particularly in contentious domains such as public health—amplify misinformation and exacerbate societal polarization. To mitigate this, we propose a novel belief-calibration paradigm grounded in adversarial debate between two complementary AI agents. Our method introduces an evidence-driven adversarial reasoning framework integrating human belief modeling, personalized AI-judge fine-tuning, and a structured dual-system debate protocol, rigorously evaluated through controlled human-subject experiments. Key contributions include: (1) the first empirical demonstration that AI-mediated adversarial debate significantly reduces human belief bias—improving overall judgment accuracy by 10% (15.2% among mainstream believers; 4.7% among skeptics); and (2) a personalized AI judge achieving 78.5% accuracy—surpassing both human evaluators (70.1%) and default AI baselines (69.8%)—thereby overcoming limitations of conventional single-advisor supervision and advancing scalable, robust AI governance.

Technology Category

Application Category

📝 Abstract

As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides-especially on consequential topics like public health where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI truthfulness by enabling humans to supervise systems that may exceed human capabilities--yet humans themselves hold different beliefs and biases that impair their judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial COVID-19 factuality claims where people hold strong prior beliefs. We conduct two studies: one with human judges holding either mainstream or skeptical beliefs evaluating factuality claims through AI-assisted debate or consultancy protocols, and a second examining the same problem with personalized AI judges designed to mimic these different human belief systems. In our human study, we find that debate-where two AI advisor systems present opposing evidence-based arguments-consistently improves judgment accuracy and confidence calibration, outperforming consultancy with a single-advisor system by 10% overall. The improvement is most significant for judges with mainstream beliefs (+15.2% accuracy), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In our AI judge study, we find that AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight--leveraging both diverse human and AI judgments to move closer to truth in contested domains.

Problem

Research questions and friction points this paper is trying to address.

Assessing controversial claims with AI debate to reduce misinformation

Improving judgment accuracy in biased human judges via AI debates

Exploring AI debate for scalable oversight in contested domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI debate improves judgment accuracy

Personalized AI judges mimic human beliefs

Debate outperforms single-advisor consultancy

🔎 Similar Papers

No similar papers found.