Twinner: Shining Light on Digital Twins in a Few Snaps

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the challenge of jointly reconstructing scene illumination, object geometry, and material appearance from an extremely limited number of multi-view images—enabling high-fidelity digital twin modeling. To overcome key bottlenecks in data efficiency, memory footprint, and real-domain generalization inherent in existing approaches, we propose: (1) a memory-efficient voxel-grid Transformer with quadratic complexity in resolution; (2) a large-scale procedural PBR synthetic dataset for robust pretraining; and (3) differentiable physically based rendering (PBR) supervision enabling ground-truth-free training and synthetic-to-real domain adaptation. Evaluated on the StanfordORB real-world benchmark, our method achieves superior reconstruction quality using only 3–5 input views—outperforming feedforward baselines and matching the fidelity of slow, per-scene optimization methods. Our approach significantly advances few-shot 3D perception and material-aware reconstruction capabilities.

Technology Category

Application Category

📝 Abstract

We present the first large reconstruction model, Twinner, capable of recovering a scene's illumination as well as an object's geometry and material properties from only a few posed images. Twinner is based on the Large Reconstruction Model and innovates in three key ways: 1) We introduce a memory-efficient voxel-grid transformer whose memory scales only quadratically with the size of the voxel grid. 2) To deal with scarcity of high-quality ground-truth PBR-shaded models, we introduce a large fully-synthetic dataset of procedurally-generated PBR-textured objects lit with varied illumination. 3) To narrow the synthetic-to-real gap, we finetune the model on real life datasets by means of a differentiable physically-based shading model, eschewing the need for ground-truth illumination or material properties which are challenging to obtain in real life. We demonstrate the efficacy of our model on the real life StanfordORB benchmark where, given few input views, we achieve reconstruction quality significantly superior to existing feedforward reconstruction networks, and comparable to significantly slower per-scene optimization methods.

Problem

Research questions and friction points this paper is trying to address.

Recovering scene illumination, geometry, and material properties from few images.

Addressing scarcity of high-quality PBR-shaded models with synthetic datasets.

Bridging synthetic-to-real gap using differentiable physically-based shading models.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory-efficient voxel-grid transformer scales quadratically.

Large synthetic dataset with varied illumination created.

Differentiable shading model bridges synthetic-real gap.

🔎 Similar Papers

No similar papers found.

World Labs

$250,000-$350,000 base salary (good-faith estimate for San Francisco Bay Area upon hire; actual offer based on experience, skills, and qualifications)

San Francisco / San Francisco Office, San Francisco, California, United States

3D Computer Vision Researcher

Kitware

Arlington, Virginia

Authors to Follow