A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing child image datasets lack multimodal support and fail to adequately cover real-world scenarios, fictional characters (e.g., cartoons), and edge cases such as partial occlusion—severely limiting research on minor-content detection. To address this, we introduce ICCWD, the first multimodal benchmark specifically designed for this task, comprising 10,000 images paired with expert-annotated textual descriptions that span realistic, fictional, and partially visible body depictions. ICCWD enables image–text joint modeling and fills a critical gap in multimodal minor-content detection benchmarks. Comprehensive evaluation reveals that state-of-the-art methods achieve only a 75.3% true positive rate, underscoring the task’s substantial difficulty. We publicly release ICCWD to foster development of more robust, age-sensitive content recognition systems.

Technology Category

Application Category

📝 Abstract

Platforms and the law regulate digital content depicting minors (defined as individuals under 18 years of age) differently from other types of content. Given the sheer amount of content that needs to be assessed, machine learning-based automation tools are commonly used to detect content depicting minors. To our knowledge, no dataset or benchmark currently exists for detecting these identification methods in a multi-modal environment. To fill this gap, we release the Image-Caption Children in the Wild Dataset (ICCWD), an image-caption dataset aimed at benchmarking tools that detect depictions of minors. Our dataset is richer than previous child image datasets, containing images of children in a variety of contexts, including fictional depictions and partially visible bodies. ICCWD contains 10,000 image-caption pairs manually labeled to indicate the presence or absence of a child in the image. To demonstrate the possible utility of our dataset, we use it to benchmark three different detectors, including a commercial age estimation system applied to images. Our results suggest that child detection is a challenging task, with the best method achieving a 75.3% true positive rate. We hope the release of our dataset will aid in the design of better minor detection methods in a wide range of scenarios.

Problem

Research questions and friction points this paper is trying to address.

Lack of dataset for detecting minors in multi-modal content

Need for benchmarking tools to identify child depictions

Challenges in achieving high accuracy in minor detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal image-caption dataset for minor detection

Benchmarking three detectors including commercial systems

Manually labeled 10,000 diverse image-caption pairs

🔎 Similar Papers

Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis