MetaHarm: Harmful YouTube Video Dataset Annotated by Domain Experts, GPT-4-Turbo, and Crowdworkers

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Current短视频 platforms lack large-scale, multimodal, fine-grained benchmark datasets for harmful content (e.g., clickbait, hate speech, misinformation). To address this, we introduce the first large-scale, multi-perspective collaboratively annotated YouTube harmful video dataset, comprising 60,906 candidate videos and 19,422 expert-, GPT-4-Turbo-, and crowdsourced-annotated samples. It supports both binary classification and six-class fine-grained multi-label identification. We propose the novel “expert + large language model + crowd” tri-source collaborative annotation paradigm, ensuring annotation consistency and enabling source-disaggregated subset analysis. Our multimodal inputs integrate 14 keyframes, thumbnails, and textual metadata, processed via GPT-4-Turbo zero-shot prompting guided by expert-defined protocols. Evaluation shows substantial performance gains: 92.3% binary accuracy (vs. expert consensus) and an average 17.6% improvement in macro-F1 for six-class multi-label classification over single-source baselines.

Technology Category

Application Category

📝 Abstract

Short video platforms, such as YouTube, Instagram, or TikTok, are used by billions of users. These platforms expose users to harmful content, ranging from clickbait or physical harms to hate or misinformation. Yet, we lack a comprehensive understanding and measurement of online harm on short video platforms. Toward this end, we present two large-scale datasets of multi-modal and multi-categorical online harm: (1) 60,906 systematically selected potentially harmful YouTube videos and (2) 19,422 videos annotated by three labeling actors: trained domain experts, GPT-4-Turbo (using 14 image frames, 1 thumbnail, and text metadata), and crowdworkers (Amazon Mechanical Turk master workers). The annotated dataset includes both (a) binary classification (harmful vs. harmless) and (b) multi-label categorizations of six harm categories: Information, Hate and harassment, Addictive, Clickbait, Sexual, and Physical harms. Furthermore, the annotated dataset provides (1) ground truth data with videos annotated consistently across (a) all three actors and (b) the majority of the labeling actors, and (2) three data subsets labeled by individual actors. These datasets are expected to facilitate future work on online harm, aid in (multi-modal) classification efforts, and advance the identification and potential mitigation of harmful content on video platforms.

Problem

Research questions and friction points this paper is trying to address.

Lack of comprehensive understanding of online harm on short video platforms

Need for multi-modal and multi-categorical datasets to measure harmful content

Absence of expert-annotated ground truth data for harmful video classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal dataset with 60,906 YouTube videos

Annotations by experts, GPT-4-Turbo, and crowdworkers

Binary and multi-label harm categorization

🔎 Similar Papers

No similar papers found.