🤖 AI Summary
This work addresses the challenge of efficiently executing three-dimensional convolutional neural networks (3D CNNs) on conventional silicon-based hardware, which suffers from cubic growth in computational complexity and struggles to balance energy efficiency with processing speed. The authors propose a novel optoelectronic hybrid architecture that integrates the quantum coherence of cold rubidium-85 atomic arrays with an optical correlator to realize a spatiotemporal holographic correlator. This system enables large-scale parallel offloading of 3D convolutional layers by leveraging the atomic ensemble for temporal information storage and a two-dimensional spatial correlator for joint spatiotemporal convolution. The approach supports parallel kernels of size 30×40 pixels across 8 frames, achieving 59.72% classification accuracy on four human action recognition datasets and a theoretical throughput of 125,000 frames per second, thereby establishing a new high-efficiency, low-latency paradigm for 3D CNN acceleration.
📝 Abstract
Three-dimensional convolutional neural networks (3D CNNs) have demonstrated remarkable performance in video recognition tasks by processing both spatial and temporal features. However, the cubic scaling of computational complexity poses significant time and energy efficiency challenges for conventional silicon-based hardware. To address this, we propose a hybrid optoelectronic architecture that delegates the computationally intensive 3D convolutional layer to an opto-atomic Spatio-temporal Holographic Correlator (STHC). This system stores temporal information as atomic coherence in an array of inhomogeneously broadened cold Rubidium-85 atoms and combines a traditional 2D spatial correlator to perform correlation in both space and time simultaneously. Our results on a four-class human action dataset demonstrate a classification accuracy of 59.72% using parallel large-scale kernels (30X40 pixels spatially, 8 frames temporally), with potential operating speeds projected up to 125,000 frames per second. This approach offers a pathway to massively accelerated video classification through a hybrid architecture.