Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos

๐Ÿ“… 2025-06-01
๐Ÿ›๏ธ Expert systems with applications
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the video-based multi-personโ€“object interaction (HOI) recognition task. We propose a Geometry-Visual Graph Neural Network (GV-GNN) that jointly models 3D human pose geometry, visual appearance features, and spatiotemporal dynamics across persons and objects. Methodologically, we explicitly incorporate 3D pose geometric priors into dynamic graph construction, design a cross-subject interaction attention mechanism, and integrate multi-scale spatiotemporal convolutions with differentiable geometric graph pooling for fine-grained joint inference. On CAD-120, V-COCO, and HICO-DET, GV-GNN achieves consistent mAP improvements of 3.2โ€“5.7%, significantly enhancing robustness to occlusion and dense interactions. To our knowledge, this is the first work to systematically embed explicit 3D geometric priors into HOI graph modeling, establishing a novel multimodal spatiotemporal interaction understanding paradigm.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Fusing visual and geometric features for HOI recognition
Handling multi-person concurrent interactions in videos
Addressing occlusion and dynamic human-object relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-attention feature fusion for multimodal integration
Interdependent entity graph learning for interaction modeling
Concurrent Partial Interaction Dataset for real-world scenarios
๐Ÿ”Ž Similar Papers
No similar papers found.