MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing challenges in AR and embodied intelligence—namely multi-view illumination inconsistency, shadow distortion, and poor scalability of inverse rendering for object synthesis—this paper proposes a two-stage feed-forward synthesis framework. Stage one achieves geometric-semantic alignment between 2D images and 3D Gaussian scenes via Hilbert curve mapping; stage two bypasses iterative diffusion and directly predicts illumination and shadows for efficient, photorealistic synthesis. Key contributions include: (1) the first large-scale benchmark dataset specifically designed for 3D synthesis; (2) the first lightweight framework integrating Hilbert-space mapping with feed-forward inverse rendering; and (3) state-of-the-art harmony scores on both standard and custom benchmarks, enabling real-time inference and demonstrating strong generalization and robustness on real-world smartphone-captured scenes.

Technology Category

Application Category

📝 Abstract
Object compositing offers significant promise for augmented reality (AR) and embodied intelligence applications. Existing approaches predominantly focus on single-image scenarios or intrinsic decomposition techniques, facing challenges with multi-view consistency, complex scenes, and diverse lighting conditions. Recent inverse rendering advancements, such as 3D Gaussian and diffusion-based methods, have enhanced consistency but are limited by scalability, heavy data requirements, or prolonged reconstruction time per scene. To broaden its applicability, we introduce MV-CoLight, a two-stage framework for illumination-consistent object compositing in both 2D images and 3D scenes. Our novel feed-forward architecture models lighting and shadows directly, avoiding the iterative biases of diffusion-based methods. We employ a Hilbert curve-based mapping to align 2D image inputs with 3D Gaussian scene representations seamlessly. To facilitate training and evaluation, we further introduce a large-scale 3D compositing dataset. Experiments demonstrate state-of-the-art harmonized results across standard benchmarks and our dataset, as well as casually captured real-world scenes demonstrate the framework's robustness and wide generalization.
Problem

Research questions and friction points this paper is trying to address.

Achieving multi-view lighting and shadow consistency in object compositing
Overcoming scalability and data limitations in inverse rendering methods
Enhancing realism in 2D/3D scenes with efficient feed-forward architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage framework for consistent object compositing
Feed-forward architecture for direct lighting modeling
Hilbert curve-based 2D-3D alignment technique
Kerui Ren
Kerui Ren
Shanghai Jiao Tong University, Shanghai AI Laboratory
3D ReconstructionNeural Rendering
J
Jiayang Bai
Nanjing University
L
Linning Xu
The Chinese University of Hong Kong
Lihan Jiang
Lihan Jiang
USTC, Shanghai AI Laboratory
neural rendering3d reconstruction
J
Jiangmiao Pang
Shanghai Artificial Intelligence Laboratory
Mulin Yu
Mulin Yu
Shanghai AILab; INRIA
3D reconstruction and 3D repairing
B
Bo Dai
The University of Hong Kong