GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This study addresses the challenge of group-level emotion recognition in real-world scenarios, which is hindered by the scarcity of large-scale, multimodal datasets with contextual annotations. To bridge this gap, the authors introduce GAViD, a novel dataset comprising 5,091 video clips that uniquely integrates visual, audio, contextual metadata, and action cues. Contextual information is efficiently annotated with the aid of VideoGPT to enhance labeling scalability. Furthermore, they propose CAGNet, a context-aware multimodal network designed for cross-modal fusion and context-driven affective reasoning. Experimental results demonstrate that CAGNet achieves an accuracy of 63.20% on GAViD, establishing state-of-the-art performance in group emotion recognition.

Technology Category

Application Category

📝 Abstract

Understanding affective dynamics in real-world social systems is fundamental to modeling and analyzing human-human interactions in complex environments. Group affect emerges from intertwined human-human interactions, contextual influences, and behavioral cues, making its quantitative modeling a challenging computational social systems problem. However, computational modeling of group affect in in-the-wild scenarios remains challenging due to limited large-scale annotated datasets and the inherent complexity of multimodal social interactions shaped by contextual and behavioral variability. The lack of comprehensive datasets annotated with multimodal and contextual information further limits advances in the field. To address this, we introduce the Group Affect from ViDeos (GAViD) dataset, comprising 5091 video clips with multimodal data (video, audio and context), annotated with ternary valence and discrete emotion labels and enriched with VideoGPT-generated contextual metadata and human-annotated action cues. We also present Context-Aware Group Affect Recognition Network (CAGNet) for multimodal context-aware group affect recognition. CAGNet achieves 63.20\% test accuracy on GAViD, comparable to state-of-the-art performance. The dataset and code are available at github.com/deepakkumar-iitr/GAViD.

Problem

Research questions and friction points this paper is trying to address.

group affect

multimodal dataset

context-aware

in-the-wild scenarios

computational social systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal dataset

group affect recognition

context-aware modeling