A Review of Human-Object Interaction Detection

📅 2024-08-20
🏛️ 2024 2nd International Conference on Computer, Vision and Intelligent Technology (ICCVIT)
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses human-object interaction (HOI) detection—a core problem in high-level visual understanding—requiring precise localization of humans and objects in images/videos alongside fine-grained interaction classification. Methodologically, it systematically surveys mainstream benchmarks and contrasts two dominant paradigms: two-stage and end-to-end HOI detection frameworks. Crucially, it presents the first unified empirical and mechanistic comparison between them. The work further pioneers novel directions in zero-shot and weakly supervised HOI learning, introducing a multimodal large language model–enabled collaborative reasoning mechanism. By synthesizing methodological evolution, capability boundaries, and persistent technical bottlenecks, the paper constructs a comprehensive HOI landscape. It identifies three critical future research frontiers: scalable interaction modeling, open-vocabulary generalization, and temporally coherent video-based HOI detection.

Technology Category

Application Category

📝 Abstract
Human-object interaction (HOI) detection plays a key role in high-level visual understanding, facilitating a deep comprehension of human activities. Specifically, HOI detection aims to locate the humans and objects involved in interactions within images or videos and classify their specific interactions. The success of this task is influenced by several key factors, including the accurate localization of human and object instances and the correct classification of object categories and interaction relationships. This paper systematically summarizes and discusses the recent work in image-based HOI detection. First, the mainstream datasets involved in HOI relationship detection are introduced. Furthermore, starting with two-stage methods and end-to-end one-stage detection approaches, this paper comprehensively discusses the current developments in image-based HOI detection, analyzing the strengths and weaknesses of these two methods. Additionally, the advancements of zero-shot learning, weakly supervised learning, and the application of large-scale language models in HOI detection are discussed. Finally, the current challenges in HOI detection are outlined, and potential research directions and future trends are explored.
Problem

Research questions and friction points this paper is trying to address.

Detects and classifies human-object interactions in images/videos.
Analyzes localization and classification accuracy in HOI detection.
Explores advancements and challenges in image-based HOI detection.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage and one-stage HOI detection methods
Zero-shot and weakly supervised learning techniques
Integration of large-scale language models
🔎 Similar Papers
No similar papers found.
Y
Yuxiao Wang
School of Furture Technology, South China University of Technology, GuangZhou, China
Q
Qiwei Xiong
School of Furture Technology, South China University of Technology, GuangZhou, China
Y
Yu Lei
School of Information Science & Technology, Southwest Jiaotong University, ChengDu, China
W
Weiying Xue
School of Furture Technology, South China University of Technology, GuangZhou, China
Q
Qi Liu
School of Furture Technology, South China University of Technology, GuangZhou, China
Z
Zhenao Wei
School of Furture Technology, South China University of Technology, GuangZhou, China