MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Addressing challenges in AR and embodied intelligence—namely multi-view illumination inconsistency, shadow distortion, and poor scalability of inverse rendering for object synthesis—this paper proposes a two-stage feed-forward synthesis framework. Stage one achieves geometric-semantic alignment between 2D images and 3D Gaussian scenes via Hilbert curve mapping; stage two bypasses iterative diffusion and directly predicts illumination and shadows for efficient, photorealistic synthesis. Key contributions include: (1) the first large-scale benchmark dataset specifically designed for 3D synthesis; (2) the first lightweight framework integrating Hilbert-space mapping with feed-forward inverse rendering; and (3) state-of-the-art harmony scores on both standard and custom benchmarks, enabling real-time inference and demonstrating strong generalization and robustness on real-world smartphone-captured scenes.

Technology Category

Application Category

📝 Abstract

Object compositing offers significant promise for augmented reality (AR) and embodied intelligence applications. Existing approaches predominantly focus on single-image scenarios or intrinsic decomposition techniques, facing challenges with multi-view consistency, complex scenes, and diverse lighting conditions. Recent inverse rendering advancements, such as 3D Gaussian and diffusion-based methods, have enhanced consistency but are limited by scalability, heavy data requirements, or prolonged reconstruction time per scene. To broaden its applicability, we introduce MV-CoLight, a two-stage framework for illumination-consistent object compositing in both 2D images and 3D scenes. Our novel feed-forward architecture models lighting and shadows directly, avoiding the iterative biases of diffusion-based methods. We employ a Hilbert curve-based mapping to align 2D image inputs with 3D Gaussian scene representations seamlessly. To facilitate training and evaluation, we further introduce a large-scale 3D compositing dataset. Experiments demonstrate state-of-the-art harmonized results across standard benchmarks and our dataset, as well as casually captured real-world scenes demonstrate the framework's robustness and wide generalization.

Problem

Research questions and friction points this paper is trying to address.

Achieving multi-view lighting and shadow consistency in object compositing

Overcoming scalability and data limitations in inverse rendering methods

Enhancing realism in 2D/3D scenes with efficient feed-forward architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage framework for consistent object compositing

Feed-forward architecture for direct lighting modeling

Hilbert curve-based 2D-3D alignment technique

🔎 Similar Papers

OmniSR: Shadow Removal under Direct and Indirect Lighting