🤖 AI Summary
This work addresses the high memory consumption and computational latency hindering the deployment of Variational Bayesian Gaussian Splatting (VBGS) on edge devices. To this end, we propose an accuracy-adaptive optimization framework that preserves the variational formulation while significantly reducing resource overhead through automatic mixed-precision allocation, fusion of memory-intensive operators, and relative-error-constrained precision control. Our approach enables, for the first time, on-device training of neural view synthesis (NVS) on a commercial embedded platform (Jetson Orin Nano). Experiments demonstrate that on an A5000 GPU, peak memory usage drops from 9.44 GB to 1.11 GB and training time decreases from 234 to 61 minutes. On the Orin Nano, per-frame latency is reduced by 19× while maintaining or even improving reconstruction quality.
📝 Abstract
Novel view synthesis (NVS) is increasingly relevant for edge robotics, where compact and incrementally updatable 3D scene models are needed for SLAM, navigation, and inspection under tight memory and latency budgets. Variational Bayesian Gaussian Splatting (VBGS) enables replay-free continual updates for the 3DGS algorithm by maintaining a probabilistic scene model, but its high-precision computations and large intermediate tensors make on-device training impractical. We present a precision-adaptive optimization framework that enables VBGS training on resource-constrained hardware without altering its variational formulation. We (i) profile VBGS to identify memory/latency hotspots, (ii) fuse memory-dominant kernels to reduce materialized intermediate tensors, and (iii) automatically assign operation-level precisions via a mixed-precision search with bounded relative error. Across the Blender, Habitat, and Replica datasets, our optimised pipeline reduces peak memory from 9.44 GB to 1.11 GB and training time from ~234 min to ~61 min on an A5000 GPU, while preserving (and in some cases improving) reconstruction quality of the state-of-the-art VBGS baseline. We also enable for the first time NVS training on a commercial embedded platform, the Jetson Orin Nano, reducing per-frame latency by 19x compared to 3DGS.