Human Interaction-Aware 3D Reconstruction from a Single Image

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-person 3D reconstruction methods often suffer from geometric incompleteness and implausible interpenetrations when handling occlusions and close interactions. This work proposes HUG3D, a novel framework that explicitly incorporates group-level contextual cues and physical interaction priors to jointly model individual and collective information in an orthonormal canonical space, enabling collaborative optimization of occlusion, contact, and spatial relationships. HUG3D comprises two core modules: Human Group-Instance Multi-View Diffusion (HUG-MVD) and Geometric Reconstruction (HUG-GR), which integrate multi-view normal generation with physics-aware geometric refinement to achieve high-fidelity texture fusion. Requiring only a single input image, HUG3D significantly outperforms current state-of-the-art single- and multi-person reconstruction approaches in both reconstruction fidelity and physical plausibility.
📝 Abstract
Reconstructing textured 3D human models from a single image is fundamental for AR/VR and digital human applications. However, existing methods mostly focus on single individuals and thus fail in multi-human scenes, where naive composition of individual reconstructions often leads to artifacts such as unrealistic overlaps, missing geometry in occluded regions, and distorted interactions. These limitations highlight the need for approaches that incorporate group-level context and interaction priors. We introduce a holistic method that explicitly models both group- and instance-level information. To mitigate perspective-induced geometric distortions, we first transform the input into a canonical orthographic space. Our primary component, Human Group-Instance Multi-View Diffusion (HUG-MVD), then generates complete multi-view normals and images by jointly modeling individuals and group context to resolve occlusions and proximity. Subsequently, the Human Group-Instance Geometric Reconstruction (HUG-GR) module optimizes the geometry by leveraging explicit, physics-based interaction priors to enforce physical plausibility and accurately model inter-human contact. Finally, the multi-view images are fused into a high-fidelity texture. Together, these components form our complete framework, HUG3D. Extensive experiments show that HUG3D significantly outperforms both single-human and existing multi-human methods, producing physically plausible, high-fidelity 3D reconstructions of interacting people from a single image. Project page: https://jongheean11.github.io/HUG3D_project
Problem

Research questions and friction points this paper is trying to address.

3D reconstruction
multi-human interaction
single-image
occlusion handling
physical plausibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-human 3D reconstruction
interaction-aware modeling
group-instance context
physics-based priors
multi-view diffusion
🔎 Similar Papers
No similar papers found.
Gwanghyun Kim
Gwanghyun Kim
Seoul National University (SNU)
Generative AIMultimodal LearningComputer Vision3DDigital Humans
J
Junghun James Kim
IPAI, Seoul National University, Republic of Korea
S
Suh Yoon Jeon
Dept. of Electrical and Computer Engineering, Seoul National University, Republic of Korea
J
Jason Park
Dept. of Electrical and Computer Engineering, Seoul National University, Republic of Korea
Se Young Chun
Se Young Chun
Department of Electrical and Computer Engineering, Seoul National University
computational imagingmachine learningsignal processingmultimodal processing