🤖 AI Summary
Existing large-scale scene reconstruction methods typically adopt a patch-based optimization paradigm, leading to global inconsistency and complex hyperparameter tuning. This paper introduces the first end-to-end holistic modeling framework that jointly optimizes camera poses and Gaussian attributes, achieving globally consistent yet locally detailed 3D representations. Our core contributions are: (1) a view-aware joint encoding-decoding mechanism; (2) hybrid Gaussian rendering with parameterization; and (3) a progressive hybrid decoding strategy. To our knowledge, this is the first method enabling full-scene training on city-scale scenes using only a single 24 GB GPU. It achieves state-of-the-art rendering quality on large-scale scenes, accelerates training significantly, and reduces GPU memory consumption by 57%.
📝 Abstract
Recent advances in 3D Gaussian Splatting have shown remarkable potential for novel view synthesis. However, most existing large-scale scene reconstruction methods rely on the divide-and-conquer paradigm, which often leads to the loss of global scene information and requires complex parameter tuning due to scene partitioning and local optimization. To address these limitations, we propose MixGS, a novel holistic optimization framework for large-scale 3D scene reconstruction. MixGS models the entire scene holistically by integrating camera pose and Gaussian attributes into a view-aware representation, which is decoded into fine-detailed Gaussians. Furthermore, a novel mixing operation combines decoded and original Gaussians to jointly preserve global coherence and local fidelity. Extensive experiments on large-scale scenes demonstrate that MixGS achieves state-of-the-art rendering quality and competitive speed, while significantly reducing computational requirements, enabling large-scale scene reconstruction training on a single 24GB VRAM GPU. The code will be released at https://github.com/azhuantou/MixGS.