🤖 AI Summary
Neural rendering for real-time 3D immersive experiences on edge devices faces two key bottlenecks: the absence of cross-scene general-purpose algorithms and hardware accelerators tailored only to specific rendering pipelines. This paper introduces the first unified neural rendering accelerator designed for edge devices, featuring a novel reconfigurable hardware architecture that supports diverse mainstream and emerging hybrid pipelines—including NeRF, Gaussian Splatting (GS), and diffusion-based renderers. Leveraging operator-sharing-driven reconfigurable dataflows, customized tensor units, and dynamic workload scheduling, the accelerator enables real-time adaptation to heterogeneous rendering metrics. Experimental evaluation demonstrates ≥30 FPS real-time performance on both synthetic and real-world scenes, achieving a 2.1× improvement in energy efficiency over state-of-the-art specialized accelerators. To our knowledge, this is the first work to achieve unified, cross-pipeline, cross-scene, and energy-efficient neural rendering acceleration at the edge.
📝 Abstract
Recent advancements in neural rendering technologies and their supporting devices have paved the way for immersive 3D experiences, significantly transforming human interaction with intelligent devices across diverse applications. However, achieving the desired real-time rendering speeds for immersive interactions is still hindered by (1) the lack of a universal algorithmic solution for different application scenarios and (2) the dedication of existing devices or accelerators to merely specific rendering pipelines. To overcome this challenge, we have developed a unified neural rendering accelerator that caters to a wide array of typical neural rendering pipelines, enabling real-time and on-device rendering across different applications while maintaining both efficiency and compatibility. Our accelerator design is based on the insight that, although neural rendering pipelines vary and their algorithm designs are continually evolving, they typically share common operators, predominantly executing similar workloads. Building on this insight, we propose a reconfigurable hardware architecture that can dynamically adjust dataflow to align with specific rendering metric requirements for diverse applications, effectively supporting both typical and the latest hybrid rendering pipelines. Benchmarking experiments and ablation studies on both synthetic and real-world scenes demonstrate the effectiveness of the proposed accelerator. The proposed unified accelerator stands out as the first solution capable of achieving real-time neural rendering across varied representative pipelines on edge devices, potentially paving the way for the next generation of neural graphics applications.