On the Robustness of Human-Object Interaction Detection against Distribution Shift

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
HOI detection exhibits insufficient robustness under distribution shifts, as existing methods predominantly assume ideal data distributions and thus suffer from limited real-world generalization. To address this, we introduce Robust-HOI—the first automated benchmark for evaluating robustness in HOI detection—systematically assessing over 40 state-of-the-art methods and uncovering prevalent cross-domain failure patterns. We propose a plug-and-play robust training framework that integrates MixUp-based regularization with cross-domain data augmentation, coupled with a frozen-vision-backbone-driven multimodal feature fusion mechanism to enhance semantic alignment and domain invariance. Our approach significantly improves model robustness across diverse distribution shift scenarios—including domain, viewpoint, and style shifts—while simultaneously achieving performance gains on standard benchmarks (HICO-DET and V-COCO). To foster reproducibility and community advancement, we will open-source the benchmark suite, annotated datasets, evaluation tools, and implementation code.

Technology Category

Application Category

📝 Abstract
Human-Object Interaction (HOI) detection has seen substantial advances in recent years. However, existing works focus on the standard setting with ideal images and natural distribution, far from practical scenarios with inevitable distribution shifts. This hampers the practical applicability of HOI detection. In this work, we investigate this issue by benchmarking, analyzing, and enhancing the robustness of HOI detection models under various distribution shifts. We start by proposing a novel automated approach to create the first robustness evaluation benchmark for HOI detection. Subsequently, we evaluate more than 40 existing HOI detection models on this benchmark, showing their insufficiency, analyzing the features of different frameworks, and discussing how the robustness in HOI is different from other tasks. With the insights from such analyses, we propose to improve the robustness of HOI detection methods through: (1) a cross-domain data augmentation integrated with mixup, and (2) a feature fusion strategy with frozen vision foundation models. Both are simple, plug-and-play, and applicable to various methods. Our experimental results demonstrate that the proposed approach significantly increases the robustness of various methods, with benefits on standard benchmarks, too. The dataset and code will be released.
Problem

Research questions and friction points this paper is trying to address.

Assessing HOI detection robustness under distribution shifts
Creating a benchmark for evaluating HOI model robustness
Improving robustness via data augmentation and feature fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated robustness evaluation benchmark creation
Cross-domain data augmentation with mixup
Feature fusion using frozen vision models
🔎 Similar Papers
No similar papers found.
C
Chi Xie
Tongji University
S
Shuang Liang
Tongji University
J
Jie Li
Sensetime Research
F
Feng Zhu
Sensetime Research
R
Rui Zhao
Sensetime Research
Yichen Wei
Yichen Wei
SHUKUN Technology
deep learningcomputer visionmedical image analysis
S
Shengjie Zhao
Tongji University