Infant Cry Detection Using Causal Temporal Representation

📅 2025-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the practical challenges of scarce fine-grained temporal annotations and severe background noise interference in infant cry detection, this paper introduces CrySeg—its first professionally annotated dataset specifically designed for temporal cry segmentation. We further propose CRSTC (Causal Representation Learning for Sound Temporal Clustering), an unsupervised method driven by causal temporal representation learning. CRSTC innovatively integrates causal discovery for modeling time-series dynamics, temporal contrastive learning for discriminative feature extraction, and sparse transition graph clustering for event boundary inference—enabling precise cry event segmentation without any manual labels. Evaluated under realistic noisy conditions, CRSTC achieves performance on par with state-of-the-art supervised methods, significantly improving segmentation accuracy and robustness. This work provides a deployable, annotation-free technical foundation for intelligent infant monitoring systems.

Technology Category

Application Category

📝 Abstract
This paper addresses a major challenge in acoustic event detection, in particular infant cry detection in the presence of other sounds and background noises: the lack of precise annotated data. We present two contributions for supervised and unsupervised infant cry detection. The first is an annotated dataset for cry segmentation, which enables supervised models to achieve state-of-the-art performance. Additionally, we propose a novel unsupervised method, Causal Representation Spare Transition Clustering (CRSTC), based on causal temporal representation, which helps address the issue of data scarcity more generally. By integrating the detected cry segments, we significantly improve the performance of downstream infant cry classification, highlighting the potential of this approach for infant care applications.
Problem

Research questions and friction points this paper is trying to address.

Lack of precise annotated data for infant cry detection.
Challenges in detecting infant cries amidst background noises.
Improving infant cry classification through advanced detection methods.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Annotated dataset for cry segmentation
Unsupervised CRSTC method for data scarcity
Improved infant cry classification performance
🔎 Similar Papers
No similar papers found.
M
Minghao Fu
1Mohamed Bin Zayed University of Artificial Intelligence, 2Cradle AI
D
Danning Li
3McGill University
A
Aryan Gadhiya
3McGill University
B
Benjamin Lambright
3McGill University
M
Mohamed Alowais
3McGill University
M
Mohab Bahnassy
3McGill University
S
Saad El Dine Elletter
3McGill University
Hawau Olamide Toyin
Hawau Olamide Toyin
PhD student at MBZUAI
Speech Synthesis and RecognitionStuttering SpeechNLPML
H
Haiyan Jiang
1Mohamed Bin Zayed University of Artificial Intelligence, 2Cradle AI
K
Kun Zhang
1Mohamed Bin Zayed University of Artificial Intelligence, 2Cradle AI
Hanan Aldarmaki
Hanan Aldarmaki
MBZUAI
NLPSpeech Processing