3DGauCIM: Accelerating Static/Dynamic 3D Gaussian Splatting via Digital CIM for High Frame Rate Real-Time Edge Rendering

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of high DRAM access overhead, substantial sorting latency, low on-chip cache utilization, and poor compatibility with digital compute-in-memory (DCIM) architectures in real-time rendering of dynamic 3D Gaussian Splatting (3DGS) on edge devices, this work proposes an algorithm–hardware co-design methodology. Specifically, we introduce a DRAM-aware frustum culling mechanism, an adaptive tile grouping strategy, a lightweight Bucket-Bitonic sorting algorithm, and a DCIM-optimized computation flow. Implemented on a 16-nm DCIM prototype chip, our solution achieves over 200 FPS real-time rendering for both large-scale static and dynamic scenes, consuming only 0.28 W and 0.63 W, respectively. This represents a significant improvement in energy efficiency and frame rate. To the best of our knowledge, this is the first complete software–hardware co-designed solution for edge-deployable 3DGS that natively supports DCIM acceleration.

Technology Category

Application Category

📝 Abstract
Dynamic 3D Gaussian splatting (3DGS) extends static 3DGS to render dynamic scenes, enabling AR/VR applications with moving objects. However, implementing dynamic 3DGS on edge devices faces challenges: (1) Loading all Gaussian parameters from DRAM for frustum culling incurs high energy costs. (2) Increased parameters for dynamic scenes elevate sorting latency and energy consumption. (3) Limited on-chip buffer capacity with higher parameters reduces buffer reuse, causing frequent DRAM access. (4) Dynamic 3DGS operations are not readily compatible with digital compute-in-memory (DCIM). These challenges hinder real-time performance and power efficiency on edge devices, leading to reduced battery life or requiring bulky batteries. To tackle these challenges, we propose algorithm-hardware co-design techniques. At the algorithmic level, we introduce three optimizations: (1) DRAM-access reduction frustum culling to lower DRAM access overhead, (2) Adaptive tile grouping to enhance on-chip buffer reuse, and (3) Adaptive interval initialization Bucket-Bitonic sort to reduce sorting latency. At the hardware level, we present a DCIM-friendly computation flow that is evaluated using the measured data from a 16nm DCIM prototype chip. Our experimental results on Large-Scale Real-World Static/Dynamic Datasets demonstrate the ability to achieve high frame rate real-time rendering exceeding 200 frame per second (FPS) with minimal power consumption, merely 0.28 W for static Large-Scale Real-World scenes and 0.63 W for dynamic Large-Scale Real-World scenes. This work successfully addresses the significant challenges of implementing static/dynamic 3DGS technology on resource-constrained edge devices.
Problem

Research questions and friction points this paper is trying to address.

Reducing DRAM access energy for dynamic 3D Gaussian splatting
Minimizing sorting latency in dynamic scene parameter processing
Enhancing on-chip buffer reuse for edge device efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

DRAM-access reduction frustum culling
Adaptive tile grouping buffer reuse
DCIM-friendly computation flow
🔎 Similar Papers
2024-01-08arXiv.orgCitations: 127