LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing large reconstruction models (LRMs) achieve strong geometric reconstruction under sparse views but struggle to accurately recover unseen regions, glossy materials, and renderable 3D content supporting real-time relighting. This paper proposes the first millisecond-scale joint reconstruction framework for sparse-view inputs, simultaneously generating high-fidelity geometry (represented as a hexa-plane neural signed distance field), spatially varying material properties, and view-dependent radiance fields. Our method introduces a progressive multi-view update mechanism and neural directional embeddings, integrated within a Transformer architecture and trained via a coarse-to-fine strategy on a large-scale shape-material dataset. Quantitatively, it matches dense-view optimization methods in geometric and relighting accuracy while accelerating inference by two orders of magnitude (<1 s). Moreover, it natively supports integration with standard graphics engines and real-time relighting—significantly enhancing practicality and deployment potential.

Technology Category

Application Category

📝 Abstract

We present Large Inverse Rendering Model (LIRM), a transformer architecture that jointly reconstructs high-quality shape, materials, and radiance fields with view-dependent effects in less than a second. Our model builds upon the recent Large Reconstruction Models (LRMs) that achieve state-of-the-art sparse-view reconstruction quality. However, existing LRMs struggle to reconstruct unseen parts accurately and cannot recover glossy appearance or generate relightable 3D contents that can be consumed by standard Graphics engines. To address these limitations, we make three key technical contributions to build a more practical multi-view 3D reconstruction framework. First, we introduce an update model that allows us to progressively add more input views to improve our reconstruction. Second, we propose a hexa-plane neural SDF representation to better recover detailed textures, geometry and material parameters. Third, we develop a novel neural directional-embedding mechanism to handle view-dependent effects. Trained on a large-scale shape and material dataset with a tailored coarse-to-fine training scheme, our model achieves compelling results. It compares favorably to optimization-based dense-view inverse rendering methods in terms of geometry and relighting accuracy, while requiring only a fraction of the inference time.

Problem

Research questions and friction points this paper is trying to address.

Reconstructs high-quality shape, materials, radiance fields quickly

Improves accuracy of unseen parts and glossy appearance recovery

Enables relightable 3D content for standard graphics engines

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer architecture for joint reconstruction

Hexa-plane neural SDF for detailed textures

Neural directional-embedding for view-dependent effects

🔎 Similar Papers

UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video