GraphiContact: Pose-aware Human-Scene Robust Contact Perception for Interactive Systems

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing monocular image-based methods for vertex-level human-scene contact prediction lack robustness to occlusion and perceptual noise, limiting their applicability in interactive systems. This work proposes GraphiContact, a framework that jointly performs 3D human mesh reconstruction and contact prediction by leveraging reconstructed geometry as a structural scaffold for reasoning and incorporating a pose-aware mechanism to enhance accuracy. The approach integrates human priors from two pretrained Transformer encoders, introduces a Single-Image Multi-uncertainty (SIMU) training strategy to simulate occlusion and noise during learning, and employs token-level adaptive routing to enable efficient single-branch inference at test time. Evaluated across five benchmark datasets, the method consistently achieves performance gains in both contact prediction and 3D human reconstruction tasks.

Technology Category

Application Category

📝 Abstract
Monocular vertex-level human-scene contact prediction is a fundamental capability for interactive systems such as assistive monitoring, embodied AI, and rehabilitation analysis. In this work, we study this task jointly with single-image 3D human mesh reconstruction, using reconstructed body geometry as a scaffold for contact reasoning. Existing approaches either focus on contact prediction without sufficiently exploiting explicit 3D human priors, or emphasize pose/mesh reconstruction without directly optimizing robust vertex-level contact inference under occlusion and perceptual noise. To address this gap, we propose GraphiContact, a pose-aware framework that transfers complementary human priors from two pretrained Transformer encoders and predicts per-vertex human-scene contact on the reconstructed mesh. To improve robustness in real-world scenarios, we further introduce a Single-Image Multi-Infer Uncertainty (SIMU) training strategy with token-level adaptive routing, which simulates occlusion and noisy observations during training while preserving efficient single-branch inference at test time. Experiments on five benchmark datasets show that GraphiContact achieves consistent gains on both contact prediction and 3D human reconstruction. Our code, based on the GraphiContact method, provides comprehensive 3D human reconstruction and interaction analysis, and will be publicly available at https://github.com/Aveiro-Lin/GraphiContact.
Problem

Research questions and friction points this paper is trying to address.

human-scene contact
3D human mesh reconstruction
occlusion
perceptual noise
monocular vision
Innovation

Methods, ideas, or system contributions that make the work stand out.

pose-aware contact perception
3D human mesh reconstruction
Transformer priors
vertex-level contact prediction
uncertainty-aware training
🔎 Similar Papers
No similar papers found.
X
Xiaojian Lin
Tsinghua University, China
Y
Yaomin Shen
XR System Application Research Center, Nanchang Research Institute, Zhejiang University, China
J
Junyuan Ma
Tsinghua University, China
Yujie Sun
Yujie Sun
Professor, Department of Chemistry, University of Cincinnati
Inorganic ChemistryElectrochemistryPhotochemistry
C
Chengqing Bu
Tsinghua University, China
Wenxin Zhang
Wenxin Zhang
University of Chinese Academy of Sciences
Deep LearningSelf-supervised LearningGraph neural networks
Z
Zongzheng Zhang
Tsinghua University, China
Hao Fei
Hao Fei
National University of Singapore
Vision and LanguageLarge Language ModelNatural Language ProcessingWorld Modeling
L
Lei Jin
Beijing University of Posts and Telecommunications, China
Hao Zhao
Hao Zhao
Tsinghua University
Computer Vision