🤖 AI Summary
Existing approaches to estimating shared attention often neglect explicit group detection or assume a single point of attention in the image, which limits both performance and practical applicability. This work proposes the first end-to-end framework that jointly models group detection and shared attention estimation through a two-stage pipeline: it first fuses individual gaze heatmaps with scalar group membership indicators to generate an initial shared attention heatmap, then iteratively refines group relationships and the final heatmap via a feedback mechanism. By overcoming the limitations of single-attention-point assumptions and group-agnostic modeling, the proposed method achieves significant improvements over existing approaches on both tasks, with ablation studies confirming the contribution of each component.
📝 Abstract
This paper proposes an end-to-end shared attention estimation method via group detection. Most previous methods estimate shared attention (SA) without detecting the actual group of people focusing on it, or assume that there is a single SA point in a given image. These issues limit the applicability of SA detection in practice and impact performance. To address them, we propose to simultaneously achieve group detection and shared attention estimation using a two step process: (i) the generation of SA heatmaps relying on individual gaze attention heatmaps and group membership scalars estimated in a group inference; (ii) a refinement of the initial group memberships allowing to account for the initial SA heatmaps, and the final prediction of the SA heatmap. Experiments demonstrate that our method outperforms other methods in group detection and shared attention estimation. Additional analyses validate the effectiveness of the proposed components. Code: https://github.com/chihina/sagd-CVPRW2026.