SAM 3D: 3Dfy Anything in Images

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This paper addresses the challenges of single-image 3D object reconstruction in natural scenes—specifically, inaccurate estimation of geometry, texture, and spatial layout due to occlusion and clutter—by proposing a vision-perception-driven generative reconstruction framework. Methodologically: (1) we design a human-in-the-loop annotation pipeline to construct the first large-scale, visually grounded real-world 3D dataset; (2) we adopt a multi-stage training paradigm combining synthetic-data pretraining with real-data alignment to alleviate the scarcity of 3D supervision; and (3) we integrate a context-aware module for joint estimation of shape, pose, and texture. Experiments demonstrate that our method achieves a ≥5:1 win rate over state-of-the-art methods in human preference evaluations on real-image reconstruction. To foster reproducibility and community advancement, we will publicly release the code, pretrained models, an interactive online demo, and the new benchmark dataset.

Technology Category

Application Category

📝 Abstract

We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image. SAM 3D excels in natural images, where occlusion and scene clutter are common and visual recognition cues from context play a larger role. We achieve this with a human- and model-in-the-loop pipeline for annotating object shape, texture, and pose, providing visually grounded 3D reconstruction data at unprecedented scale. We learn from this data in a modern, multi-stage training framework that combines synthetic pretraining with real-world alignment, breaking the 3D "data barrier". We obtain significant gains over recent work, with at least a 5:1 win rate in human preference tests on real-world objects and scenes. We will release our code and model weights, an online demo, and a new challenging benchmark for in-the-wild 3D object reconstruction.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing 3D objects from single images

Handling occlusion and clutter in natural scenes

Overcoming limited 3D training data availability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative model for single-image 3D reconstruction

Human- and model-in-the-loop annotation pipeline

Multi-stage training combining synthetic and real data

🔎 Similar Papers

No similar papers found.