🤖 AI Summary
This study addresses the limitations of current fact-checking approaches, which are predominantly text-centric and ill-equipped to handle misinformation in audio-based platforms where prosody, emotion, and multi-turn dialogue structures play critical roles. It systematically uncovers the dual characteristics of audio misinformation—its "audibility" and "dialogicity"—and demonstrates how conventional text-centered paradigms fail to capture these dimensions. To bridge this gap, the work proposes a novel framework integrating cross-modal analysis, dialogue structure modeling, and speech feature interpretation. Through comprehensive evaluation of existing datasets and methods, the research elucidates the mechanisms underlying the failure of current techniques in audio contexts, thereby establishing a theoretical foundation for next-generation fact-checking systems tailored to spoken, interactive media.
📝 Abstract
Audio platforms have evolved beyond entertainment. They have become central to public discourse, from podcasts and radio to WhatsApp voice notes and live streams. With millions of shows and hundreds of millions of listeners, audio platforms are now a major channel for misinformation. Yet existing fact-checking pipelines are mostly designed for written claims, overlooking the unique properties of spoken media. We argue that audio misinformation is not merely textual content with transcripts: it is structurally different because it is both spoken - carrying persuasive force through prosody, pacing, and emotion - and conversational - unfolding across turns, speakers, and episodes. These dual properties introduce verification difficulties that traditional methods rarely face. This position paper synthesizes evidence across modalities and platforms, examines datasets and methods, and highlights why existing pipelines fail on audio. We argue that advancing fact-checking requires rethinking verification pipelines around the spoken and conversational realities of audio.