Part-Aware Bottom-Up Group Reasoning for Fine-Grained Social Interaction Detection

📅 2025-11-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing social interaction detection methods overly rely on holistic individual representations, neglecting fine-grained behavioral cues—such as facial expressions, gaze direction, and hand gestures—and failing to explicitly model inter-individual interactions, resulting in insufficient localization of social signals and ambiguous group inference. To address this, we propose a Part-Aware Group Reasoning framework that performs body-part-level feature extraction and similarity-driven association modeling, integrating spatial configurations and multimodal social cues (gaze, expression, gesture) for bottom-up reasoning from local actions to global group structures. Evaluated on the NVI dataset, our approach significantly outperforms state-of-the-art methods. It is the first to systematically achieve fine-grained, interpretable social interaction detection, establishing a new benchmark for understanding group structure grounded in subtle behavioral signals.

Technology Category

Application Category

📝 Abstract
Social interactions often emerge from subtle, fine-grained cues such as facial expressions, gaze, and gestures. However, existing methods for social interaction detection overlook such nuanced cues and primarily rely on holistic representations of individuals. Moreover, they directly detect social groups without explicitly modeling the underlying interactions between individuals. These drawbacks limit their ability to capture localized social signals and introduce ambiguity when group configurations should be inferred from social interactions grounded in nuanced cues. In this work, we propose a part-aware bottom-up group reasoning framework for fine-grained social interaction detection. The proposed method infers social groups and their interactions using body part features and their interpersonal relations. Our model first detects individuals and enhances their features using part-aware cues, and then infers group configuration by associating individuals via similarity-based reasoning, which considers not only spatial relations but also subtle social cues that signal interactions, leading to more accurate group inference. Experiments on the NVI dataset demonstrate that our method outperforms prior methods, achieving the new state of the art.
Problem

Research questions and friction points this paper is trying to address.

Detecting social interactions from subtle cues like facial expressions and gestures
Overcoming reliance on holistic representations that overlook nuanced social signals
Explicitly modeling interpersonal relations for accurate group configuration inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses body part features for interaction detection
Infers groups via similarity-based reasoning
Enhances features with part-aware social cues
🔎 Similar Papers
No similar papers found.