🤖 AI Summary
Existing weakly supervised group activity recognition methods rely on detectors or attention mechanisms to localize individual action regions but neglect their semantic correlations, leading to visual-semantic misalignment. To address this, we propose a visual-concept knowledge-driven recognition framework: (1) we construct individual action prototypes as interpretable visual-semantic units; (2) we generate semantic-guided action heatmaps grounded in the image correlation theorem; and (3) we incorporate group-level statistical priors to enhance action distribution modeling. Our approach is the first to unify action semantic prototypes with geometrically interpretable theory, enabling end-to-end weakly supervised training. Experiments on Volleyball and NBA datasets demonstrate significant improvements over state-of-the-art methods, with robust performance even under few-shot settings. The code will be made publicly available.
📝 Abstract
Existing weakly supervised group activity recognition methods rely on object detectors or attention mechanisms to capture key areas automatically. However, they overlook the semantic information associated with captured areas, which may adversely affect the recognition performance. In this paper, we propose a novel framework named Visual Conceptual Knowledge Guided Action Map (VicKAM) which effectively captures the locations of individual actions and integrates them with action semantics for weakly supervised group activity recognition.It generates individual action prototypes from training set as visual conceptual knowledge to bridge action semantics and visual representations. Guided by this knowledge, VicKAM produces action maps that indicate the likelihood of each action occurring at various locations, based on image correlation theorem. It further augments individual action maps using group activity related statistical information, representing individual action distribution under different group activities, to establish connections between action maps and specific group activities. The augmented action map is incorporated with action semantic representations for group activity recognition.Extensive experiments on two public benchmarks, the Volleyball and the NBA datasets, demonstrate the effectiveness of our proposed method, even in cases of limited training data. The code will be released later.