GCC: A 3DGS Inference Architecture with Gaussian-Wise and Cross-Stage Conditional Processing

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address computational redundancy and data movement overheads arising from the decoupling of preprocessing and rendering in mobile 3D Gaussian Splatting (3DGS), this paper proposes GCC, a hardware acceleration architecture. GCC introduces three key innovations: (1) a cross-stage dynamic skipping mechanism that conditionally executes preprocessing based on real-time rendering requirements, thereby avoiding unnecessary Gaussian generation; (2) Gaussian-granularity rendering scheduling, which eliminates redundant tile-wise data reloading; and (3) an alpha-based boundary detection method for precise compression of effective Gaussian regions. Implemented in 28 nm CMOS technology, GCC achieves 2.1× higher energy efficiency and 1.8× higher throughput compared to the state-of-the-art accelerator GSCore. Furthermore, it reduces redundant computation by 63% and memory accesses by 57%.

Technology Category

Application Category

📝 Abstract
3D Gaussian Splatting (3DGS) has emerged as a leading neural rendering technique for high-fidelity view synthesis, prompting the development of dedicated 3DGS accelerators for mobile applications. Through in-depth analysis, we identify two major limitations in the conventional decoupled preprocessing-rendering dataflow adopted by existing accelerators: 1) a significant portion of preprocessed Gaussians are not used in rendering, and 2) the same Gaussian gets repeatedly loaded across different tile renderings, resulting in substantial computational and data movement overhead. To address these issues, we propose GCC, a novel accelerator designed for fast and energy-efficient 3DGS inference. At the dataflow level, GCC introduces: 1) cross-stage conditional processing, which interleaves preprocessing and rendering to dynamically skip unnecessary Gaussian preprocessing; and 2) Gaussian-wise rendering, ensuring that all rendering operations for a given Gaussian are completed before moving to the next, thereby eliminating duplicated Gaussian loading. We also propose an alpha-based boundary identification method to derive compact and accurate Gaussian regions, thereby reducing rendering costs. We implement our GCC accelerator in 28nm technology. Extensive experiments demonstrate that GCC significantly outperforms the state-of-the-art 3DGS inference accelerator, GSCore, in both performance and energy efficiency.
Problem

Research questions and friction points this paper is trying to address.

Reduces unused Gaussian preprocessing in 3DGS rendering
Eliminates duplicated Gaussian loading across tile renderings
Improves 3DGS inference speed and energy efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-stage conditional processing for dynamic skipping
Gaussian-wise rendering to eliminate duplicate loading
Alpha-based boundary identification for compact regions
🔎 Similar Papers
No similar papers found.
Minnan Pei
Minnan Pei
CASIA, UCAS
Artificial IntelligenceHardware Architecture3D Vision
G
Gang Li
Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences; The Key Laboratory of Cognition and Decision Intelligence for Complex Systems
J
Junwen Si
Institute of Automation, Chinese Academy of Sciences; The Key Laboratory of Cognition and Decision Intelligence for Complex Systems
Zeyu Zhu
Zeyu Zhu
Yale University
Computational Social ScienceComplexity in Social Science
Z
Zitao Mo
Institute of Automation, Chinese Academy of Sciences; The Key Laboratory of Cognition and Decision Intelligence for Complex Systems
Peisong Wang
Peisong Wang
CASIA
Deep Neural Network Acceleration and Compression
Z
Zhuoran Song
Shanghai Jiao Tong University
Xiaoyao Liang
Xiaoyao Liang
Shanghai Jiao Tong University
Computer Architecture
J
Jian Cheng
Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences; The Key Laboratory of Cognition and Decision Intelligence for Complex Systems