Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos

๐Ÿ“… 2025-06-01
๐Ÿ›๏ธ Expert systems with applications
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the video-based multi-personโ€“object interaction (HOI) recognition task. We propose a Geometry-Visual Graph Neural Network (GV-GNN) that jointly models 3D human pose geometry, visual appearance features, and spatiotemporal dynamics across persons and objects. Methodologically, we explicitly incorporate 3D pose geometric priors into dynamic graph construction, design a cross-subject interaction attention mechanism, and integrate multi-scale spatiotemporal convolutions with differentiable geometric graph pooling for fine-grained joint inference. On CAD-120, V-COCO, and HICO-DET, GV-GNN achieves consistent mAP improvements of 3.2โ€“5.7%, significantly enhancing robustness to occlusion and dense interactions. To our knowledge, this is the first work to systematically embed explicit 3D geometric priors into HOI graph modeling, establishing a novel multimodal spatiotemporal interaction understanding paradigm.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Fusing visual and geometric features for HOI recognition
Handling multi-person concurrent interactions in videos
Addressing occlusion and dynamic human-object relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-attention feature fusion for multimodal integration
Interdependent entity graph learning for interaction modeling
Concurrent Partial Interaction Dataset for real-world scenarios
๐Ÿ”Ž Similar Papers
No similar papers found.
T
Tanqiu Qiao
Durham University, Department of Computer Science, Durham, DH1 3LE, UK
R
Ruochen Li
Durham University, Department of Computer Science, Durham, DH1 3LE, UK
F
Frederick W. B. Li
Durham University, Department of Computer Science, Durham, DH1 3LE, UK
Y
Yoshiki Kubotani
cvpaper.challenge, Tokyo, Japan
Shigeo Morishima
Shigeo Morishima
Professor of Applied Physics, Waseda University
Computer GraphicsComputer VisionHuman Computer Interaction
Hubert P. H. Shum
Hubert P. H. Shum
Professor of Visual Computing, Director of Research in Computer Science, Durham University
Responsible AIComputer VisionComputer GraphicsAI in Healthcare