🤖 AI Summary
This work proposes a novel post-processing paradigm for 3D reconstruction by introducing the Mixture-of-Experts (MoE) mechanism to address common artifacts such as blurred depth boundaries and flying pixels. Unlike existing feed-forward approaches, the method predicts multiple candidate depth maps and dynamically fuses them using learned weights, effectively sharpening object boundaries and suppressing outliers with negligible computational overhead. The framework seamlessly integrates with pretrained backbone networks—such as VGGT—without requiring retraining, thereby significantly enhancing both geometric accuracy and visual fidelity. This approach establishes an efficient, general-purpose post-processing strategy that advances the state of the art in high-quality 3D reconstruction.
📝 Abstract
We propose a simple yet effective approach to enhance the performance of feed-forward 3D reconstruction models. Existing methods often struggle near depth discontinuities, where standard regression losses encourage spatial averaging and thus blur sharp boundaries. To address this issue, we introduce a mixture-of-experts formulation that handles uncertainty at depth boundaries by combining multiple smooth depth predictions. A softmax weighting head dynamically selects among these hypotheses on a per-pixel basis. By integrating our mixture model into a pre-trained state-of-the-art 3D model, we achieve a substantial reduction of boundary artifacts and gains in overall reconstruction accuracy. Notably, our approach is highly compute efficient, delivering generalizable improvements even when fine-tuned on a small subset of training data while incurring only negligible additional inference computation, suggesting a promising direction for lightweight and accurate 3D reconstruction.