PICO: Reconstructing 3D People In Contact with Objects

📅 2025-04-24

📈 Citations: 1

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Reconstructing 3D human–object interaction (HOI) from a single color image is challenged by depth ambiguity, severe occlusion, and high variability in object shape and appearance. Existing methods rely on controlled environments and restricted object categories, limiting generalizability. This paper introduces PICO-fit: a novel framework for open-vocabulary, end-to-end 3D HOI reconstruction from natural images. We first construct PICO-db—the first densely annotated 3D contact dataset for real-world images. Then, we propose a contact-guided render-and-compare fitting paradigm that integrates vision foundation model–based 3D object mesh retrieval, two-click contact projection, SMPL-X human body modeling, and differentiable rendering optimization. Our method achieves state-of-the-art accuracy on unseen object categories and enables the first end-to-end 3D HOI reconstruction across dozens of everyday objects. Both code and the PICO-db dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Recovering 3D Human-Object Interaction (HOI) from single color images is challenging due to depth ambiguities, occlusions, and the huge variation in object shape and appearance. Thus, past work requires controlled settings such as known object shapes and contacts, and tackles only limited object classes. Instead, we need methods that generalize to natural images and novel object classes. We tackle this in two main ways: (1) We collect PICO-db, a new dataset of natural images uniquely paired with dense 3D contact on both body and object meshes. To this end, we use images from the recent DAMON dataset that are paired with contacts, but these contacts are only annotated on a canonical 3D body. In contrast, we seek contact labels on both the body and the object. To infer these given an image, we retrieve an appropriate 3D object mesh from a database by leveraging vision foundation models. Then, we project DAMON's body contact patches onto the object via a novel method needing only 2 clicks per patch. This minimal human input establishes rich contact correspondences between bodies and objects. (2) We exploit our new dataset of contact correspondences in a novel render-and-compare fitting method, called PICO-fit, to recover 3D body and object meshes in interaction. PICO-fit infers contact for the SMPL-X body, retrieves a likely 3D object mesh and contact from PICO-db for that object, and uses the contact to iteratively fit the 3D body and object meshes to image evidence via optimization. Uniquely, PICO-fit works well for many object categories that no existing method can tackle. This is crucial to enable HOI understanding to scale in the wild. Our data and code are available at https://pico.is.tue.mpg.de.

Problem

Research questions and friction points this paper is trying to address.

Recovering 3D Human-Object Interaction from single images

Generalizing to natural images and novel object classes

Inferring contact for 3D body and object meshes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collects PICO-db dataset with dense 3D contact labels

Uses vision foundation models for 3D object retrieval

Develops PICO-fit for render-and-compare mesh fitting

🔎 Similar Papers

No similar papers found.