🤖 AI Summary
To address inefficient 2D texture cache utilization hindering 3D Gaussian Splatting (3DGS) deployment on mobile GPUs, this paper proposes an efficient Gaussian rasterization method tailored for mobile devices. Our core innovation is a texture-cache-aware sorting algorithm, integrated with optimized variable layout and memory-access acceleration strategies, to significantly improve utilization of 2D memory structures. Experiments demonstrate that our method achieves a 4.1× speedup in the sorting stage and a 1.7× end-to-end acceleration in 3D scene reconstruction, while reducing peak memory consumption by up to 1.6×—all without compromising reconstruction quality. To the best of our knowledge, this is the first work to systematically resolve the texture cache bottleneck of 3DGS on mobile GPUs, establishing a deployable hardware–software co-optimization paradigm for lightweight, real-time neural radiance field modeling.
📝 Abstract
Image-based 3D scene reconstruction, which transforms multi-view images into a structured 3D representation of the surrounding environment, is a common task across many modern applications. 3D Gaussian Splatting (3DGS) is a new paradigm to address this problem and offers considerable efficiency as compared to the previous methods. Motivated by this, and considering various benefits of mobile device deployment (data privacy, operating without internet connectivity, and potentially faster responses), this paper develops Texture3dgs, an optimized mapping of 3DGS for a mobile GPU. A critical challenge in this area turns out to be optimizing for the two-dimensional (2D) texture cache, which needs to be exploited for faster executions on mobile GPUs. As a sorting method dominates the computations in 3DGS on mobile platforms, the core of Texture3dgs is a novel sorting algorithm where the processing, data movement, and placement are highly optimized for 2D memory. The properties of this algorithm are analyzed in view of a cost model for the texture cache. In addition, we accelerate other steps of the 3DGS algorithm through improved variable layout design and other optimizations. End-to-end evaluation shows that Texture3dgs delivers up to 4.1$ imes$ and 1.7$ imes$ speedup for the sorting and overall 3D scene reconstruction, respectively -- while also reducing memory usage by up to 1.6$ imes$ -- demonstrating the effectiveness of our design for efficient mobile 3D scene reconstruction.