🤖 AI Summary
Processing large-scale point clouds (hundreds of thousands of points) suffers from energy-efficiency bottlenecks due to O(n²) computational complexity and high memory-access overhead; existing accelerators are hindered by inefficient spatial partitioning and serial architectures that limit scalability. Method: This work proposes a fractal-inspired hardware architecture featuring a novel shape-aware fractal co-partitioning scheme for hardware-friendly point cloud tiling, coupled with a block-level parallel point operation mechanism leveraging on-chip fractal interconnects and fully parallel compute units to enable scalable processing under resource constraints. Contribution/Results: A dedicated chip fabricated in 28 nm CMOS occupies only 1.5 mm². It achieves 21.7× speedup and 27× energy-efficiency improvement over state-of-the-art accelerators, with zero accuracy loss.
📝 Abstract
Three-dimensional (3D) point clouds are increasingly used in applications such as autonomous driving, robotics, and virtual reality (VR). Point-based neural networks (PNNs) have demonstrated strong performance in point cloud analysis, originally targeting small-scale inputs. However, as PNNs evolve to process large-scale point clouds with hundreds of thousands of points, all-to-all computation and global memory access in point cloud processing introduce substantial overhead, causing $O(n^2)$ computational complexity and memory traffic where n is the number of points}. Existing accelerators, primarily optimized for small-scale workloads, overlook this challenge and scale poorly due to inefficient partitioning and non-parallel architectures. To address these issues, we propose FractalCloud, a fractal-inspired hardware architecture for efficient large-scale 3D point cloud processing. FractalCloud introduces two key optimizations: (1) a co-designed Fractal method for shape-aware and hardware-friendly partitioning, and (2) block-parallel point operations that decompose and parallelize all point operations. A dedicated hardware design with on-chip fractal and flexible parallelism further enables fully parallel processing within limited memory resources. Implemented in 28 nm technology as a chip layout with a core area of 1.5 $mm^2$, FractalCloud achieves 21.7x speedup and 27x energy reduction over state-of-the-art accelerators while maintaining network accuracy, demonstrating its scalability and efficiency for PNN inference.