CREMD: Crowd-Sourced Emotional Multimodal Dogs Dataset

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This study addresses the reliability challenges in canine emotion recognition stemming from subjective assessment and the lack of standardized annotation protocols. To this end, the authors construct a multimodal dataset comprising 923 video clips and, for the first time, integrate crowdsourced labeling with multimodal presentation formats—contextual, audio, and visual—to systematically investigate how annotator characteristics (e.g., dog ownership experience, gender, and professional expertise) influence emotion judgments. The findings reveal that visual context significantly enhances inter-annotator agreement; while audio does not markedly improve overall consistency, it bolsters confidence in identifying anger and fear. Notably, non-dog owners and male annotators exhibit higher agreement than anticipated, and professionals perform as hypothesized. This work establishes a standardized dataset and methodological framework for animal emotion recognition research.

Technology Category

Application Category

📝 Abstract

Dog emotion recognition plays a crucial role in enhancing human-animal interactions, veterinary care, and the development of automated systems for monitoring canine well-being. However, accurately interpreting dog emotions is challenging due to the subjective nature of emotional assessments and the absence of standardized ground truth methods. We present the CREMD (Crowd-sourced Emotional Multimodal Dogs Dataset), a comprehensive dataset exploring how different presentation modes (e.g., context, audio, video) and annotator characteristics (e.g., dog ownership, gender, professional experience) influence the perception and labeling of dog emotions. The dataset consists of 923 video clips presented in three distinct modes: without context or audio, with context but no audio, and with both context and audio. We analyze annotations from diverse participants, including dog owners, professionals, and individuals with varying demographic backgrounds and experience levels, to identify factors that influence reliable dog emotion recognition. Our findings reveal several key insights: (1) while adding visual context significantly improved annotation agreement, our findings regarding audio cues are inconclusive due to design limitations (specifically, the absence of a no-context-with-audio condition and limited clean audio availability); (2) contrary to expectations, non-owners and male annotators showed higher agreement levels than dog owners and female annotators, respectively, while professionals showed higher agreement levels, aligned with our initial hypothesis; and (3) the presence of audio substantially increased annotators' confidence in identifying specific emotions, particularly anger and fear.

Problem

Research questions and friction points this paper is trying to address.

dog emotion recognition

subjective assessment

ground truth

annotation agreement

multimodal perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

dog emotion recognition

multimodal dataset

crowdsourced annotation