ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

📅 2024-08-20
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing audio datasets lack sufficient, strongly labeled data for infant crying and snoring events, hindering robust acoustic event detection research. To address this, we introduce ICSD—the first open-source benchmark dataset specifically designed for infant and toddler acoustic event detection. ICSD comprises over 1,200 hours of audio, organized into three complementary subsets: (i) real-world strongly labeled data (event-level annotations), (ii) weakly labeled data (clip-level annotations), and (iii) controllably synthesized strongly labeled data. It is the first dataset to systematically integrate multi-paradigm labeling strategies—balancing ecological validity with scalability—supported by human-curated ground truth, weakly supervised annotation pipelines, respiration- and speech-informed synthesis models, and multi-granularity quality assessment. Extensive experiments with CNN- and Transformer-based baselines demonstrate that joint training on all three subsets improves F1-score by 12.3%, establishing ICSD as the new standard benchmark for this domain.

Technology Category

Application Category

📝 Abstract
The detection and analysis of infant cry and snoring events are crucial tasks within the field of audio signal processing. While existing datasets for general sound event detection are plentiful, they often fall short in providing sufficient, strongly labeled data specific to infant cries and snoring. To provide a benchmark dataset and thus foster the research of infant cry and snoring detection, this paper introduces the Infant Cry and Snoring Detection (ICSD) dataset, a novel, publicly available dataset specially designed for ICSD tasks. The ICSD comprises three types of subsets: a real strongly labeled subset with event-based labels annotated manually, a weakly labeled subset with only clip-level event annotations, and a synthetic subset generated and labeled with strong annotations. This paper provides a detailed description of the ICSD creation process, including the challenges encountered and the solutions adopted. We offer a comprehensive characterization of the dataset, discussing its limitations and key factors for ICSD usage. Additionally, we conduct extensive experiments on the ICSD dataset to establish baseline systems and offer insights into the main factors when using this dataset for ICSD research. Our goal is to develop a dataset that will be widely adopted by the community as a new open benchmark for future ICSD research.
Problem

Research questions and friction points this paper is trying to address.

Lack of labeled data for infant cry and snoring detection
Need for a benchmark dataset for ICSD research
Challenges in creating diverse and annotated ICSD subsets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source dataset for infant cry detection
Combines real and synthetic labeled subsets
Provides baseline systems for ICSD research
🔎 Similar Papers
No similar papers found.
Qingyu Liu
Qingyu Liu
Electronic and Computer Engineering, Peking University
wireless networkingmobile networkinginternet of thingsintelligent transportation
L
Longfei Song
Shanghai Engineering Research Center of Intelligent Education and Bigdata, Shanghai Normal University, Shanghai, China
D
Dongxing Xu
Unisound AI Technology Co., Ltd., Beijing, China
Yanhua Long
Yanhua Long
Professor, Shanghai Normal University
Speech signal processing