SLTarch: Towards Scalable Point-Based Neural Rendering by Taming Workload Imbalance and Memory Irregularity

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Point-based neural rendering (PBNR) on mobile GPUs suffers from three critical bottlenecks: load imbalance in level-of-detail (LoD) search, irregular memory access patterns, and severe thread divergence during rasterization. This paper proposes an algorithm–architecture co-design: (1) the SLTree—a subtree-structured data structure—coupled with the LTcore hardware unit enables low-overhead, highly parallel LoD search; and (2) a divergence-free rasterization algorithm integrated with the SPcore hardware accelerator eliminates thread-level control-flow divergence. The approach requires only lightweight GPU hardware extensions. Evaluation shows 3.9× speedup and 98% energy reduction over baseline mobile GPUs, and 1.8× higher performance with 54% lower energy versus state-of-the-art accelerators, at negligible area overhead. Our core contribution is the first systematic integration of data-structure innovation, domain-specific hardware design, and algorithmic restructuring—explicitly addressing the sparsity and irregularity inherent to mobile PBNR workloads.

Technology Category

Application Category

📝 Abstract
Rendering is critical in fields like 3D modeling, AR/VR, and autonomous driving, where high-quality, real-time output is essential. Point-based neural rendering (PBNR) offers a photorealistic and efficient alternative to conventional methods, yet it is still challenging to achieve real-time rendering on mobile platforms. We pinpoint two major bottlenecks in PBNR pipelines: LoD search and splatting. LoD search suffers from workload imbalance and irregular memory access, making it inefficient on off-the-shelf GPUs. Meanwhile, splatting introduces severe warp divergence across GPU threads due to its inherent sparsity. To tackle these challenges, we propose SLTarch, an algorithm-architecture co-designed framework. At its core, SLTarch introduces SLTree, a dedicated subtree-based data structure, and LTcore, a specialized hardware architecture tailored for efficient LoD search. Additionally, we co-design a divergence-free splatting algorithm with our simple yet principled hardware augmentation, SPcore, to existing PBNR accelerators. Compared to a mobile GPU, SLTarch achieves 3.9$ imes$ speedup and 98% energy savings with negligible architecture overhead. Compared to existing accelerator designs, SLTarch achieves 1.8$ imes$ speedup with 54% energy savings.
Problem

Research questions and friction points this paper is trying to address.

Address workload imbalance in LoD search
Reduce memory irregularity in point-based rendering
Minimize warp divergence during splatting operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

SLTree data structure for efficient LoD search
LTcore hardware architecture for LoD search
Divergence-free splatting algorithm with SPcore
🔎 Similar Papers
No similar papers found.