🤖 AI Summary
Existing GCN-based methods for modeling human–human interaction typically treat individuals as isolated graph nodes, neglecting inherent inter-person dependencies; moreover, predefined adjacency matrices lack adaptability to capture action-specific and context-sensitive joint interactions. To address these limitations, we propose a prior-free dynamic node selection mechanism: an Adaptive Temporal Node Amplitude Computation (AT-NAC) module identifies kinematically salient motion nodes, while an External Attention (EA) mechanism models cross-subject dynamic interaction dependencies. Our approach integrates GCN with spatial motion magnitude analysis, learnable threshold-based node filtering, and temporal weighting—enabling precise perception of salient motion patterns and efficient interaction modeling. Evaluated on multiple benchmark datasets, the method achieves state-of-the-art performance, significantly enhancing flexibility and representational capacity for relational modeling in complex interactive scenarios.
📝 Abstract
Most GCN-based methods model interacting individuals as independent graphs, neglecting their inherent inter-dependencies. Although recent approaches utilize predefined interaction adjacency matrices to integrate participants, these matrices fail to adaptively capture the dynamic and context-specific joint interactions across different actions. In this paper, we propose the Active Node Selection with External Attention Network (ASEA), an innovative approach that dynamically captures interaction relationships without predefined assumptions. Our method models each participant individually using a GCN to capture intra-personal relationships, facilitating a detailed representation of their actions. To identify the most relevant nodes for interaction modeling, we introduce the Adaptive Temporal Node Amplitude Calculation (AT-NAC) module, which estimates global node activity by combining spatial motion magnitude with adaptive temporal weighting, thereby highlighting salient motion patterns while reducing irrelevant or redundant information. A learnable threshold, regularized to prevent extreme variations, is defined to selectively identify the most informative nodes for interaction modeling. To capture interactions, we design the External Attention (EA) module to operate on active nodes, effectively modeling the interaction dynamics and semantic relationships between individuals. Extensive evaluations show that our method captures interaction relationships more effectively and flexibly, achieving state-of-the-art performance.