ViClaim: A Multilingual Multilabel Dataset for Automatic Claim Detection in Videos

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Existing fact-checking tools primarily target written text and perform poorly on spoken-language transcripts from videos, especially in multilingual and multi-topic settings. This paper introduces the first fine-grained claim detection task specifically designed for video speech transcripts, covering six topics across three languages (English, German, Spanish). We construct the first publicly available dataset, annotated at the utterance level into three classes: “verifiable factual claim”, “non-verifiable statement”, or “opinion”. To address linguistic characteristics of spoken language, we propose a novel annotation framework and a dedicated crowdsourcing platform. Our approach employs mBERT and XLM-R for sequence-level classification. Experiments achieve a macro-F1 score of 0.896, though cross-topic generalization remains limited. This work bridges a critical gap in video-based misinformation detection by enabling systematic analysis of spoken content, and provides a reproducible benchmark alongside open-source annotations, models, and tooling.

Technology Category

Application Category

📝 Abstract

The growing influence of video content as a medium for communication and misinformation underscores the urgent need for effective tools to analyze claims in multilingual and multi-topic settings. Existing efforts in misinformation detection largely focus on written text, leaving a significant gap in addressing the complexity of spoken text in video transcripts. We introduce ViClaim, a dataset of 1,798 annotated video transcripts across three languages (English, German, Spanish) and six topics. Each sentence in the transcripts is labeled with three claim-related categories: fact-check-worthy, fact-non-check-worthy, or opinion. We developed a custom annotation tool to facilitate the highly complex annotation process. Experiments with state-of-the-art multilingual language models demonstrate strong performance in cross-validation (macro F1 up to 0.896) but reveal challenges in generalization to unseen topics, particularly for distinct domains. Our findings highlight the complexity of claim detection in video transcripts. ViClaim offers a robust foundation for advancing misinformation detection in video-based communication, addressing a critical gap in multimodal analysis.

Problem

Research questions and friction points this paper is trying to address.

Detecting claims in multilingual video transcripts effectively

Addressing misinformation gap in spoken text from videos

Improving generalization of claim detection across unseen topics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual video transcript dataset for claim detection

Custom annotation tool for complex labeling process

State-of-the-art multilingual models for cross-validation

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs