GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This study addresses the challenge of group-level emotion recognition in real-world scenarios, which is hindered by the scarcity of large-scale, multimodal datasets with contextual annotations. To bridge this gap, the authors introduce GAViD, a novel dataset comprising 5,091 video clips that uniquely integrates visual, audio, contextual metadata, and action cues. Contextual information is efficiently annotated with the aid of VideoGPT to enhance labeling scalability. Furthermore, they propose CAGNet, a context-aware multimodal network designed for cross-modal fusion and context-driven affective reasoning. Experimental results demonstrate that CAGNet achieves an accuracy of 63.20% on GAViD, establishing state-of-the-art performance in group emotion recognition.

Technology Category

Application Category

📝 Abstract
Understanding affective dynamics in real-world social systems is fundamental to modeling and analyzing human-human interactions in complex environments. Group affect emerges from intertwined human-human interactions, contextual influences, and behavioral cues, making its quantitative modeling a challenging computational social systems problem. However, computational modeling of group affect in in-the-wild scenarios remains challenging due to limited large-scale annotated datasets and the inherent complexity of multimodal social interactions shaped by contextual and behavioral variability. The lack of comprehensive datasets annotated with multimodal and contextual information further limits advances in the field. To address this, we introduce the Group Affect from ViDeos (GAViD) dataset, comprising 5091 video clips with multimodal data (video, audio and context), annotated with ternary valence and discrete emotion labels and enriched with VideoGPT-generated contextual metadata and human-annotated action cues. We also present Context-Aware Group Affect Recognition Network (CAGNet) for multimodal context-aware group affect recognition. CAGNet achieves 63.20\% test accuracy on GAViD, comparable to state-of-the-art performance. The dataset and code are available at github.com/deepakkumar-iitr/GAViD.
Problem

Research questions and friction points this paper is trying to address.

group affect
multimodal dataset
context-aware
in-the-wild scenarios
computational social systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal dataset
group affect recognition
context-aware modeling
CAGNet
VideoGPT-generated metadata
🔎 Similar Papers
No similar papers found.