🤖 AI Summary
To address computational redundancy and hardware stalls in real-time 3D Gaussian Splatting (3DGS) rendering on edge devices, this paper proposes LS-Gaussian, a lightweight streaming rendering framework. Methodologically, it introduces: (1) a viewpoint-transformation-driven sparse rendering algorithm that dynamically culls invisible Gaussians by exploiting inter-frame viewpoint continuity; (2) a tile-level workload-aware prediction model for fine-grained load-balanced scheduling; and (3) a customized 3DGS hardware accelerator enabling low-overhead real-time mapping. Evaluation shows LS-Gaussian achieves an average 5.41× speedup on edge GPUs and up to 17.3× on the custom accelerator, with negligible visual quality degradation (PSNR > 38 dB). The framework significantly enhances both real-time rendering throughput and energy efficiency on resource-constrained edge platforms.
📝 Abstract
3D Gaussian Splatting (3DGS) enables high-quality rendering of 3D scenes and is getting increasing adoption in domains like autonomous driving and embodied intelligence. However, 3DGS still faces major efficiency challenges when faced with high frame rate requirements and resource-constrained edge deployment. To enable efficient 3DGS, in this paper, we propose LS-Gaussian, an algorithm/hardware co-design framework for lightweight streaming 3D rendering. LS-Gaussian is motivated by the core observation that 3DGS suffers from substantial computation redundancy and stalls. On one hand, in practical scenarios, high-frame-rate 3DGS is often applied in settings where a camera observes and renders the same scene continuously but from slightly different viewpoints. Therefore, instead of rendering each frame separately, LS-Gaussian proposes a viewpoint transformation algorithm that leverages inter-frame continuity for efficient sparse rendering. On the other hand, as different tiles within an image are rendered in parallel but have imbalanced workloads, frequent hardware stalls also slow down the rendering process. LS-Gaussian predicts the workload for each tile based on viewpoint transformation to enable more balanced parallel computation and co-designs a customized 3DGS accelerator to support the workload-aware mapping in real-time. Experimental results demonstrate that LS-Gaussian achieves 5.41x speedup over the edge GPU baseline on average and up to 17.3x speedup with the customized accelerator, while incurring only minimal visual quality degradation.