🤖 AI Summary
HOI detection suffers from insufficient robustness in real-world robotic assistance scenarios due to environmental corruptions such as occlusion, noise, and illumination variation. To address this, we introduce RoHOI—the first dedicated robustness benchmark for HOI detection—incorporating 20 realistic corruption types on HICO-DET and V-COCO, along with novel evaluation metrics that systematically quantify performance degradation under environmental perturbations. To tackle the challenge of learning robust features, we propose Semantic-Aware Mask-guided Progressive Learning (SAMPL), a strategy that dynamically fuses global contextual cues with local fine-grained information via semantic-aware masking. Extensive experiments on RoHOI demonstrate that SAMPL significantly enhances the robustness of mainstream HOI detectors, yielding an average performance improvement of 5.2% across diverse corruptions while maintaining strong generalization. This work establishes a standardized evaluation framework and an effective optimization paradigm, advancing HOI detection toward practical deployment.
📝 Abstract
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support. However, models trained on clean datasets degrade in real-world conditions due to unforeseen corruptions, leading to inaccurate prediction. To address this, we introduce the first robustness benchmark for HOI detection, evaluating model resilience under diverse challenges. Despite advances, current models struggle with environmental variability, occlusion, and noise. Our benchmark, RoHOI, includes 20 corruption types based on HICO-DET and V-COCO datasets and a new robustness-focused metric. We systematically analyze existing models in the related field, revealing significant performance drops under corruptions. To improve robustness, we propose a Semantic-Aware Masking-based Progressive Learning (SAMPL) strategy to guide the model to be optimized based on holistic and partial cues, dynamically adjusting the model's optimization to enhance robust feature learning. Extensive experiments show our approach outperforms state-of-the-art methods, setting a new standard for robust HOI detection. Benchmarks, datasets, and code will be made publicly available at https://github.com/Kratos-Wen/RoHOI.