LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the challenge of reconstructing 3D human-object interactions from a single RGB image, which requires modeling dense and continuous spatial proximity between the human body and objects. The authors propose InterFields, a novel representation that captures dense, continuous surface-level proximity relationships between humans and objects. By integrating a VQ-VAE to learn a discrete LEXIS interaction signature manifold and introducing a diffusion model named LEXIS-Flow, the method jointly generates human and object meshes along with their corresponding InterFields in an end-to-end manner, eliminating the need for post-hoc optimization while ensuring physically plausible reconstructions. Evaluated on the Open3DHOI and BEHAVE datasets, the approach significantly outperforms existing state-of-the-art methods in terms of reconstruction accuracy, contact plausibility, and generalization capability.

Technology Category

Application Category

📝 Abstract

Reconstructing 3D Human-Object Interaction from an RGB image is essential for perceptive systems. Yet, this remains challenging as it requires capturing the subtle physical coupling between the body and objects. While current methods rely on sparse, binary contact cues, these fail to model the continuous proximity and dense spatial relationships that characterize natural interactions. We address this limitation via InterFields, a representation that encodes dense, continuous proximity across the entire body and object surfaces. However, inferring these fields from single images is inherently ill-posed. To tackle this, our intuition is that interaction patterns are characteristically structured by the action and object geometry. We capture this structure in LEXIS, a novel discrete manifold of interaction signatures learned via a VQ-VAE. We then develop LEXIS-Flow, a diffusion framework that leverages LEXIS signatures to estimate human and object meshes alongside their InterFields. Notably, these InterFields help in a guided refinement that ensures physically-plausible, proximity-aware reconstructions without requiring post-hoc optimization. Evaluation on Open3DHOI and BEHAVE shows that LEXIS-Flow significantly outperforms existing SotA baselines in reconstruction, contact, and proximity quality. Our approach not only improves generalization but also yields reconstructions perceived as more realistic, moving us closer to holistic 3D scene understanding. Code & models will be public at https://anticdimi.github.io/lexis.

Problem

Research questions and friction points this paper is trying to address.

3D Human-Object Interaction

Proximity Modeling

Single-image Reconstruction

Physical Coupling

Interaction Representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

InterFields

LEXIS

VQ-VAE

diffusion model

3D human-object interaction

🔎 Similar Papers

A Review of Human-Object Interaction Detection

2024-08-202024 2nd International Conference on Computer, Vision and Intelligent Technology (ICCVIT)Citations: 2