InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries

📅 2024-09-29

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address background noise interference and scarce labeled data in real-world infant cry analysis, this paper proposes a lightweight and efficient cry recognition framework. Methodologically, it introduces a novel hierarchical feature extraction mechanism that integrates a pre-trained audio model with statistical pooling and multi-head attention pooling. Additionally, knowledge distillation and quantization techniques are incorporated to achieve model compression and deployment optimization. Evaluated on a real-world neonatal cry dataset, the proposed method achieves a 4.4% accuracy improvement over state-of-the-art baselines in cry-cause classification. The model size is reduced by 7% without accuracy loss or by 28% with only a 0.08% accuracy drop—striking an exceptional balance among performance, computational efficiency, and practical deployability. This work delivers a production-ready technical solution for edge-based intelligent infant care devices in home environments.

Technology Category

Application Category

📝 Abstract

Understanding the meaning of infant cries is a significant challenge for young parents in caring for their newborns. The presence of background noise and the lack of labeled data present practical challenges in developing systems that can detect crying and analyze its underlying reasons. In this paper, we present a novel data-driven framework,"InfantCryNet,"for accomplishing these tasks. To address the issue of data scarcity, we employ pre-trained audio models to incorporate prior knowledge into our model. We propose the use of statistical pooling and multi-head attention pooling techniques to extract features more effectively. Additionally, knowledge distillation and model quantization are applied to enhance model efficiency and reduce the model size, better supporting industrial deployment in mobile devices. Experiments on real-life datasets demonstrate the superior performance of the proposed framework, outperforming state-of-the-art baselines by 4.4% in classification accuracy. The model compression effectively reduces the model size by 7% without compromising performance and by up to 28% with only an 8% decrease in accuracy, offering practical insights for model selection and system design.

Problem

Research questions and friction points this paper is trying to address.

Analyze infant cries' underlying reasons effectively.

Overcome data scarcity with pre-trained audio models.

Enhance model efficiency for mobile device deployment.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained audio models

Statistical and multi-head pooling

Knowledge distillation and quantization

🔎 Similar Papers

No similar papers found.