MAD: A Benchmark for Multi-Turn Audio Dialogue Fact-Checking

📅 2025-08-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing fact-checking research primarily focuses on isolated textual or spoken utterances, failing to address the complexities of multi-turn spoken dialogues—such as speaker turn-taking, speech overlap, prosodic variation, and dynamic information propagation. To bridge this gap, we introduce MAD, the first benchmark dataset for fact-checking real-world multi-turn audio dialogues. MAD jointly models acoustic signals, dialogue structure, and information diffusion patterns, providing sentence-level and dialogue-level veracity labels, check-worthiness scores, and multimodal scenario annotations. It is the first effort to systematically formalize reasoning challenges inherent in spoken interaction, thereby filling a critical data void in speech-dialogue joint fact-checking. Extensive evaluation reveals that strong baseline models achieve only 72–74% accuracy on sentence-level verification and 71–72% on dialogue-level verification—substantially lower than corresponding text-based tasks—confirming the task’s heightened difficulty and exposing fundamental limitations of current multimodal understanding approaches.

Technology Category

Application Category

📝 Abstract
Despite the growing popularity of audio platforms, fact-checking spoken content remains significantly underdeveloped. Misinformation in speech often unfolds across multi-turn dialogues, shaped by speaker interactions, disfluencies, overlapping speech, and emotional tone-factors that complicate both claim detection and verification. Existing datasets fall short by focusing on isolated sentences or text transcripts, without modeling the conversational and acoustic complexity of spoken misinformation. We introduce MAD (Multi-turn Audio Dialogues), the first fact-checking dataset aligned with multi-turn spoken dialogues and corresponding audio. MAD captures how misinformation is introduced, contested, and reinforced through natural conversation. Each dialogue includes annotations for speaker turns, dialogue scenarios, information spread styles, sentence-level check-worthiness, and both sentence- and dialogue-level veracity. The dataset supports two core tasks: check-worthy claim detection and claim verification. Benchmarking shows that even strong pretrained models reach only 72-74% accuracy at the sentence level and 71-72% at the dialogue level in claim verification, underscoring MAD's difficulty. MAD offers a high-quality benchmark for advancing multimodal and conversational fact-checking, while also surfacing open challenges related to reasoning over speech and dialogue dynamics.
Problem

Research questions and friction points this paper is trying to address.

Detecting misinformation in multi-turn audio dialogues
Verifying claims with conversational and acoustic complexity
Addressing gaps in existing text-only fact-checking datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

First fact-checking dataset for multi-turn audio dialogues
Includes speaker turns, dialogue scenarios, and veracity annotations
Supports check-worthy claim detection and claim verification
C
Chaewan Chun
The Pennsylvania State University, University Park, 16802, PA, USA
L
Lysandre Terrisse
The Pennsylvania State University, University Park, 16802, PA, USA
Delvin Ce Zhang
Delvin Ce Zhang
Assistant Professor, University of Sheffield
Multimodal LLMAI for Science
D
Dongwon Lee
The Pennsylvania State University, University Park, 16802, PA, USA