🤖 AI Summary
To address poor model scalability and insufficient robustness in real-world scenes for 3D geometric reconstruction, this paper introduces the first Mixture-of-Experts (MoE)-based 3D vision foundation model tailored for dense geometry estimation. Methodologically: (i) we design a dynamically routed MoE architecture enabling adaptive, multi-granularity feature allocation; (ii) we incorporate a confidence-guided depth refinement module to enhance geometric estimation stability; and (iii) we construct a semantically aligned surface normal prediction head alongside a globally consistent 3D backbone, jointly optimizing multi-task objectives. Our model achieves state-of-the-art performance on mainstream benchmarks including ScanNet and SUN RGB-D. It supports zero-overhead downstream deployment and demonstrates significantly improved cross-task generalization and robustness in complex, real-world scenarios.
📝 Abstract
Recent advances in language and vision have demonstrated that scaling up model capacity consistently improves performance across diverse tasks. In 3D visual geometry reconstruction, large-scale training has likewise proven effective for learning versatile representations. However, further scaling of 3D models is challenging due to the complexity of geometric supervision and the diversity of 3D data. To overcome these limitations, we propose MoRE, a dense 3D visual foundation model based on a Mixture-of-Experts (MoE) architecture that dynamically routes features to task-specific experts, allowing them to specialize in complementary data aspects and enhance both scalability and adaptability. Aiming to improve robustness under real-world conditions, MoRE incorporates a confidence-based depth refinement module that stabilizes and refines geometric estimation. In addition, it integrates dense semantic features with globally aligned 3D backbone representations for high-fidelity surface normal prediction. MoRE is further optimized with tailored loss functions to ensure robust learning across diverse inputs and multiple geometric tasks. Extensive experiments demonstrate that MoRE achieves state-of-the-art performance across multiple benchmarks and supports effective downstream applications without extra computation.