🤖 AI Summary
This work addresses the challenge of appearance inconsistency in 3D Gaussian splatting under wild scenes, where transient occluders and lighting variations induce conflicting gradients and reconstruction artifacts across views. To mitigate this, the authors propose a conflict-aware 3D Gaussian splatting framework that suppresses unreliable observations via semantic consistency–guided pixel-level masks and introduces a dual-view gradient harmonization mechanism that alleviates gradient conflicts through orthogonalized rotation updates. Additionally, a conflict-driven densification and pruning strategy is incorporated to dynamically optimize the evolution of Gaussian primitives. Evaluated on multiple in-the-wild benchmarks, the method significantly reduces rendering artifacts and achieves state-of-the-art visual fidelity.
📝 Abstract
In-the-wild 3D Gaussian Splatting remains challenging due to transient distractors and illumination-induced cross-view appearance inconsistencies. Existing methods mainly rely on image-level masking to suppress unreliable supervision, but masking alone cannot fully eliminate residual occlusions or resolve illumination-induced inconsistencies, both of which can introduce conflicting cross-view gradients. These unresolved conflicts may destabilize Gaussian optimization and lead to visible reconstruction artifacts. We propose a conflict-aware 3DGS framework that addresses this problem from both image-space supervision and gradient-level optimization. Semantic Consistency-Guided Masking learns pixel-wise consistency scores to adaptively refine prior masks and suppress unreliable supervision before gradient formation. A dual-view Conflict-Aware Gradient Harmonization strategy further reconciles view-specific gradients by mutually rotating them into an orthogonal configuration, reducing negative directional interference across views. We also introduce conflict-aware densification and pruning to stabilize Gaussian growth and remove persistently conflicting primitives. Extensive experiments on standard in-the-wild benchmarks demonstrate that our method achieves state-of-the-art rendering quality under complex transient distractors and cross-view inconsistencies.