🤖 AI Summary
To address the trade-off between high visual fidelity and low latency in VR real-time rendering, this paper proposes a dynamic foveated dual-stream rendering framework grounded in human visual perception. The peripheral region employs streamlined 3D Gaussian splatting for efficient, smooth rendering, while the foveal region—aligned with gaze—is rendered using a lightweight CNN-driven neural point representation to recover high-fidelity details. We pioneer the deep integration of perceptual modeling into the radiance field rendering pipeline, enabling optimal balance between computational load and perceptual quality. Furthermore, we introduce the first foveated dual-stream representation unifying 3D Gaussian splatting and neural points, augmented with point-cloud-based occlusion culling for enhanced efficiency. Evaluated on Quest 3-class hardware, our method achieves over 90 FPS, surpassing standard 3D Gaussian Splatting (3DGS) in edge sharpness and texture detail, with a user preference rate of 92%, thereby fulfilling the stringent latency and immersion requirements of interactive VR applications.
📝 Abstract
Recent advances in novel view synthesis have demonstrated impressive results in fast photorealistic scene rendering through differentiable point rendering, either via Gaussian Splatting (3DGS) [Kerbl and Kopanas et al. 2023] or neural point rendering [Aliev et al. 2020]. Unfortunately, these directions require either a large number of small Gaussians or expensive per-pixel post-processing for reconstructing fine details, which negatively impacts rendering performance. To meet the high performance demands of virtual reality (VR) systems, primitive or pixel counts therefore must be kept low, affecting visual quality. In this paper, we propose a novel hybrid approach based on foveated rendering as a promising solution that combines the strengths of both point rendering directions regarding performance sweet spots. Analyzing the compatibility with the human visual system, we find that using a low-detailed, few primitive smooth Gaussian representation for the periphery is cheap to compute and meets the perceptual demands of peripheral vision. For the fovea only, we use neural points with a convolutional neural network for the small pixel footprint, which provides sharp, detailed output within the rendering budget. This combination also allows for synergistic method accelerations with point occlusion culling and reducing the demands on the neural network. Our evaluation confirms that our approach increases sharpness and details compared to a standard VR-ready 3DGS configuration, and participants of a user study overwhelmingly preferred our method. Our system meets the necessary performance requirements for real-time VR interactions, ultimately enhancing the user's immersive experience. The project page can be found at: https://lfranke.github.io/vr_splatting