π€ AI Summary
To address the trade-off between high hardware overhead and temporal fidelity in time-surface construction for neuromorphic event camera real-time processing, this work proposes a 3D-stacked sensing-memory-computation-integrated architecture. It innovatively exploits DRAM leakage characteristics to enable hardware-based timestamp normalization and introduces an exponential-decay real-time normalization mechanism. The architecture integrates customized MOM capacitors, ultra-low-leakage switches, and an analog time-surface array to support in-sensor memory and computation. It breaks the memory wall, achieving 69Γ lower power, 2.2Γ lower latency, and 1.9Γ smaller area versus 2D counterparts. When driving GoogleNet, the time-surface representation attains 99%, 85%, 78%, and 97% accuracy on N-MNIST, NMNIST-10, DVS-Gesture, and CIFAR10-DVS, respectively. Image reconstruction achieves an SSIM of 0.62βthe state-of-the-art.
π Abstract
This work proposes a 3D Stack In-Sensor-Computing (3DS-ISC) architecture for efficient event-based vision processing. A real-time normalization method using an exponential decay function is introduced to construct the time-surface, reducing hardware usage while preserving temporal information. The circuit design utilizes the leakage characterization of Dynamic Random Access Memory(DRAM) for timestamp normalization. Custom interdigitated metal-oxide-metal capacitor (MOMCAP) is used to store the charge and low leakage switch (LL switch) is used to extend the effective charge storage time. The 3DS-ISC architecture integrates sensing, memory, and computation to overcome the memory wall problem, reducing power, latency, and reducing area by 69x, 2.2x and 1.9x, respectively, compared with its 2D counterpart. Moreover, compared to works using a 16-bit SRAM to store timestamps, the ISC analog array can reduce power consumption by three orders of magnitude. In real computer vision (CV) tasks, we applied the spatial-temporal correlation filter (STCF) for denoise, and 3D-ISC achieved almost equivalent accuracy compared to the digital implementation using high precision timestamps. As for the image classification, time-surface constructed by 3D-ISC is used as the input of GoogleNet, achieving 99% on N-MNIST, 85% on N-Caltech101, 78% on CIFAR10-DVS, and 97% on DVS128 Gesture, comparable with state-of-the-art results on each dataset. Additionally, the 3D-ISC method is also applied to image reconstruction using the DAVIS240C dataset, achieving the highest average SSIM (0.62) among three methods. This work establishes a foundation for real-time, resource-efficient event-based processing and points to future integration of advanced computational circuits for broader applications.