MACE: Mixture-of-Experts Accelerated Coordinate Encoding for Large-Scale Scene Localization and Rendering

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficient localization and high-fidelity rendering in large-scale scenes face challenges including excessive computational cost and limited capacity of single-network architectures. To address these, we propose a Mixture-of-Experts (MoE)-accelerated scene coordinate regression framework: a lightweight gating network dynamically selects a single expert per inference pass, ensuring sparse activation; and a loss-free load-balancing strategy guarantees equitable expert utilization without auxiliary objectives. Our method achieves significant improvements in both localization accuracy and training efficiency while maintaining low computational overhead. Experiments on the Cambridge Landmarks dataset demonstrate that our approach attains high-precision pose estimation and photorealistic rendering within just 10 minutes of training—substantially reducing computational cost compared to state-of-the-art methods. Specifically, it lowers median pose error by 23.6% and improves rendering PSNR by 1.8 dB.

Technology Category

Application Category

📝 Abstract
Efficient localization and high-quality rendering in large-scale scenes remain a significant challenge due to the computational cost involved. While Scene Coordinate Regression (SCR) methods perform well in small-scale localization, they are limited by the capacity of a single network when extended to large-scale scenes. To address these challenges, we propose the Mixed Expert-based Accelerated Coordinate Encoding method (MACE), which enables efficient localization and high-quality rendering in large-scale scenes. Inspired by the remarkable capabilities of MOE in large model domains, we introduce a gating network to implicitly classify and select sub-networks, ensuring that only a single sub-network is activated during each inference. Furtheremore, we present Auxiliary-Loss-Free Load Balancing(ALF-LB) strategy to enhance the localization accuracy on large-scale scene. Our framework provides a significant reduction in costs while maintaining higher precision, offering an efficient solution for large-scale scene applications. Additional experiments on the Cambridge test set demonstrate that our method achieves high-quality rendering results with merely 10 minutes of training.
Problem

Research questions and friction points this paper is trying to address.

Addresses computational cost challenges in large-scale scene localization
Overcomes single network limitations for scene coordinate regression
Enables efficient localization and high-quality rendering in large scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts network enables large-scale scene localization
Gating network activates single sub-network per inference
Auxiliary-Loss-Free Load Balancing enhances localization accuracy
🔎 Similar Papers
No similar papers found.
M
Mingkai Liu
PICO, ByteDance Inc.
D
Dikai Fan
PICO, ByteDance Inc.
H
Haohua Que
University of Georgia, Athens, USA.
H
Haojia Gao
Beijing University of Technology, Beijing, China.
X
Xiao Liu
PICO, ByteDance Inc.
S
Shuxue Peng
PICO, ByteDance Inc.
M
Meixia Lin
PICO, ByteDance Inc.
S
Shengyu Gu
PICO, ByteDance Inc.
R
Ruicong Ye
Peking University, Beijing, China.
W
Wanli Qiu
Peking University, Beijing, China.
Handong Yao
Handong Yao
University of Georgia
traffic flowconnected autonomous vehicle
R
Ruopeng Zhang
Chongqing Vocational Institute of Engineering, Chongqing, China.
X
Xianliang Huang
PICO, ByteDance Inc.