🤖 AI Summary
Gaussian Splatting (GS) suffers from appearance and shadow inconsistencies in 3D object-scene compositing, primarily due to radiance field baking, which renders lighting non-editable. To address this, we propose a relightable compositing framework: surface octahedral probes (SOPs) replace ray tracing for efficient local illumination queries and real-time soft shadow computation; global scene lighting is decoupled into localized radiance field reconstruction at object placement positions, with missing illumination completed via a diffusion model; and inverse rendering is fused with 360° radiance field interpolation to enable high-fidelity editing. Experiments demonstrate real-time rendering at 28 FPS, sub-minute editing latency (36 seconds per edit), and visually natural results with physically plausible shadows. Our method significantly enhances the capability of GS-driven immersive content creation while preserving geometric fidelity and lighting coherence.
📝 Abstract
Gaussian Splatting (GS) enables immersive rendering, but realistic 3D object-scene composition remains challenging. Baked appearance and shadow information in GS radiance fields cause inconsistencies when combining objects and scenes. Addressing this requires relightable object reconstruction and scene lighting estimation. For relightable object reconstruction, existing Gaussian-based inverse rendering methods often rely on ray tracing, leading to low efficiency. We introduce Surface Octahedral Probes (SOPs), which store lighting and occlusion information and allow efficient 3D querying via interpolation, avoiding expensive ray tracing. SOPs provide at least a 2x speedup in reconstruction and enable real-time shadow computation in Gaussian scenes. For lighting estimation, existing Gaussian-based inverse rendering methods struggle to model intricate light transport and often fail in complex scenes, while learning-based methods predict lighting from a single image and are viewpoint-sensitive. We observe that 3D object-scene composition primarily concerns the object's appearance and nearby shadows. Thus, we simplify the challenging task of full scene lighting estimation by focusing on the environment lighting at the object's placement. Specifically, we capture a 360 degrees reconstructed radiance field of the scene at the location and fine-tune a diffusion model to complete the lighting. Building on these advances, we propose ComGS, a novel 3D object-scene composition framework. Our method achieves high-quality, real-time rendering at around 28 FPS, produces visually harmonious results with vivid shadows, and requires only 36 seconds for editing. Code and dataset are available at https://nju-3dv.github.io/projects/ComGS/.