🤖 AI Summary
To address computational redundancy and data movement overheads arising from the decoupling of preprocessing and rendering in mobile 3D Gaussian Splatting (3DGS), this paper proposes GCC, a hardware acceleration architecture. GCC introduces three key innovations: (1) a cross-stage dynamic skipping mechanism that conditionally executes preprocessing based on real-time rendering requirements, thereby avoiding unnecessary Gaussian generation; (2) Gaussian-granularity rendering scheduling, which eliminates redundant tile-wise data reloading; and (3) an alpha-based boundary detection method for precise compression of effective Gaussian regions. Implemented in 28 nm CMOS technology, GCC achieves 2.1× higher energy efficiency and 1.8× higher throughput compared to the state-of-the-art accelerator GSCore. Furthermore, it reduces redundant computation by 63% and memory accesses by 57%.
📝 Abstract
3D Gaussian Splatting (3DGS) has emerged as a leading neural rendering technique for high-fidelity view synthesis, prompting the development of dedicated 3DGS accelerators for mobile applications. Through in-depth analysis, we identify two major limitations in the conventional decoupled preprocessing-rendering dataflow adopted by existing accelerators: 1) a significant portion of preprocessed Gaussians are not used in rendering, and 2) the same Gaussian gets repeatedly loaded across different tile renderings, resulting in substantial computational and data movement overhead. To address these issues, we propose GCC, a novel accelerator designed for fast and energy-efficient 3DGS inference. At the dataflow level, GCC introduces: 1) cross-stage conditional processing, which interleaves preprocessing and rendering to dynamically skip unnecessary Gaussian preprocessing; and 2) Gaussian-wise rendering, ensuring that all rendering operations for a given Gaussian are completed before moving to the next, thereby eliminating duplicated Gaussian loading. We also propose an alpha-based boundary identification method to derive compact and accurate Gaussian regions, thereby reducing rendering costs. We implement our GCC accelerator in 28nm technology. Extensive experiments demonstrate that GCC significantly outperforms the state-of-the-art 3DGS inference accelerator, GSCore, in both performance and energy efficiency.