CREMD: Crowd-Sourced Emotional Multimodal Dogs Dataset

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the reliability challenges in canine emotion recognition stemming from subjective assessment and the lack of standardized annotation protocols. To this end, the authors construct a multimodal dataset comprising 923 video clips and, for the first time, integrate crowdsourced labeling with multimodal presentation formats—contextual, audio, and visual—to systematically investigate how annotator characteristics (e.g., dog ownership experience, gender, and professional expertise) influence emotion judgments. The findings reveal that visual context significantly enhances inter-annotator agreement; while audio does not markedly improve overall consistency, it bolsters confidence in identifying anger and fear. Notably, non-dog owners and male annotators exhibit higher agreement than anticipated, and professionals perform as hypothesized. This work establishes a standardized dataset and methodological framework for animal emotion recognition research.

Technology Category

Application Category

📝 Abstract
Dog emotion recognition plays a crucial role in enhancing human-animal interactions, veterinary care, and the development of automated systems for monitoring canine well-being. However, accurately interpreting dog emotions is challenging due to the subjective nature of emotional assessments and the absence of standardized ground truth methods. We present the CREMD (Crowd-sourced Emotional Multimodal Dogs Dataset), a comprehensive dataset exploring how different presentation modes (e.g., context, audio, video) and annotator characteristics (e.g., dog ownership, gender, professional experience) influence the perception and labeling of dog emotions. The dataset consists of 923 video clips presented in three distinct modes: without context or audio, with context but no audio, and with both context and audio. We analyze annotations from diverse participants, including dog owners, professionals, and individuals with varying demographic backgrounds and experience levels, to identify factors that influence reliable dog emotion recognition. Our findings reveal several key insights: (1) while adding visual context significantly improved annotation agreement, our findings regarding audio cues are inconclusive due to design limitations (specifically, the absence of a no-context-with-audio condition and limited clean audio availability); (2) contrary to expectations, non-owners and male annotators showed higher agreement levels than dog owners and female annotators, respectively, while professionals showed higher agreement levels, aligned with our initial hypothesis; and (3) the presence of audio substantially increased annotators' confidence in identifying specific emotions, particularly anger and fear.
Problem

Research questions and friction points this paper is trying to address.

dog emotion recognition
subjective assessment
ground truth
annotation agreement
multimodal perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

dog emotion recognition
multimodal dataset
crowdsourced annotation
context and audio cues
annotation agreement
J
Jinho Baek
dept. of Computer Science, New York Institute of Technology, New York, USA
Houwei Cao
Houwei Cao
Assistant Professor of Computer Science, New York Institute of Technology
Speech & Natural Language ProcessingAffective ComputingBig Data Analytics
K
Kate Blackwell
dept. of Computer Science, New York Institute of Technology, New York, USA