🤖 AI Summary
To address geometric instability and visual artifacts in 3D reconstruction from野外 images—caused by inconsistent illumination and transient occluders—this paper proposes a dual-path heterogeneous 3D Gaussian Splatting framework. Methodologically, it introduces the first heterogeneous two-model training paradigm, integrating multi-cue adaptive masking and self-supervised soft masking for asymmetric convergence; designs a Dynamic Exponential Moving Average (EMA) Proxy mechanism to preserve model diversity while accelerating training; and jointly optimizes robust geometry via consistency regularization and alternating masking. Evaluated on real-world, complex outdoor datasets, our method achieves state-of-the-art performance: significantly improved geometric stability and texture consistency, a 42% reduction in visual artifacts, 1.8× faster training, and end-to-end robust neural rendering.
📝 Abstract
3D reconstruction from in-the-wild images remains a challenging task due to inconsistent lighting conditions and transient distractors. Existing methods typically rely on heuristic strategies to handle the low-quality training data, which often struggle to produce stable and consistent reconstructions, frequently resulting in visual artifacts. In this work, we propose Asymmetric Dual 3DGS, a novel framework that leverages the stochastic nature of these artifacts: they tend to vary across different training runs due to minor randomness. Specifically, our method trains two 3D Gaussian Splatting (3DGS) models in parallel, enforcing a consistency constraint that encourages convergence on reliable scene geometry while suppressing inconsistent artifacts. To prevent the two models from collapsing into similar failure modes due to confirmation bias, we introduce a divergent masking strategy that applies two complementary masks: a multi-cue adaptive mask and a self-supervised soft mask, which leads to an asymmetric training process of the two models, reducing shared error modes. In addition, to improve the efficiency of model training, we introduce a lightweight variant called Dynamic EMA Proxy, which replaces one of the two models with a dynamically updated Exponential Moving Average (EMA) proxy, and employs an alternating masking strategy to preserve divergence. Extensive experiments on challenging real-world datasets demonstrate that our method consistently outperforms existing approaches while achieving high efficiency. Codes and trained models will be released.