Twinner: Shining Light on Digital Twins in a Few Snaps

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of jointly reconstructing scene illumination, object geometry, and material appearance from an extremely limited number of multi-view images—enabling high-fidelity digital twin modeling. To overcome key bottlenecks in data efficiency, memory footprint, and real-domain generalization inherent in existing approaches, we propose: (1) a memory-efficient voxel-grid Transformer with quadratic complexity in resolution; (2) a large-scale procedural PBR synthetic dataset for robust pretraining; and (3) differentiable physically based rendering (PBR) supervision enabling ground-truth-free training and synthetic-to-real domain adaptation. Evaluated on the StanfordORB real-world benchmark, our method achieves superior reconstruction quality using only 3–5 input views—outperforming feedforward baselines and matching the fidelity of slow, per-scene optimization methods. Our approach significantly advances few-shot 3D perception and material-aware reconstruction capabilities.

Technology Category

Application Category

📝 Abstract
We present the first large reconstruction model, Twinner, capable of recovering a scene's illumination as well as an object's geometry and material properties from only a few posed images. Twinner is based on the Large Reconstruction Model and innovates in three key ways: 1) We introduce a memory-efficient voxel-grid transformer whose memory scales only quadratically with the size of the voxel grid. 2) To deal with scarcity of high-quality ground-truth PBR-shaded models, we introduce a large fully-synthetic dataset of procedurally-generated PBR-textured objects lit with varied illumination. 3) To narrow the synthetic-to-real gap, we finetune the model on real life datasets by means of a differentiable physically-based shading model, eschewing the need for ground-truth illumination or material properties which are challenging to obtain in real life. We demonstrate the efficacy of our model on the real life StanfordORB benchmark where, given few input views, we achieve reconstruction quality significantly superior to existing feedforward reconstruction networks, and comparable to significantly slower per-scene optimization methods.
Problem

Research questions and friction points this paper is trying to address.

Recovering scene illumination, geometry, and material properties from few images.
Addressing scarcity of high-quality PBR-shaded models with synthetic datasets.
Bridging synthetic-to-real gap using differentiable physically-based shading models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory-efficient voxel-grid transformer scales quadratically.
Large synthetic dataset with varied illumination created.
Differentiable shading model bridges synthetic-real gap.
🔎 Similar Papers
No similar papers found.