🤖 AI Summary
Background subtraction (BGS) in dynamic and complex scenes is highly susceptible to illumination changes, camera motion, and atmospheric turbulence. To address these challenges, this paper proposes a spiking neural network (SNN)-based autoencoder framework. Its key contributions are: (1) a continuous-time spiking convolution–deconvolution module that enhances temporal modeling and robustness to noise; and (2) an ANN-to-SNN self-distillation supervised learning mechanism that preserves high accuracy while substantially improving energy efficiency. Experiments on CDnet-2014 and DAVIS-2016 demonstrate that the method achieves superior foreground segmentation accuracy under challenging conditions—including abrupt illumination changes, camera jitter, and atmospheric distortion—outperforming state-of-the-art BGS approaches. Moreover, it significantly reduces inference power consumption, establishing a new paradigm for low-power video foreground detection.
📝 Abstract
Background subtraction (BGS) is utilized to detect moving objects in a video and is commonly employed at the onset of object tracking and human recognition processes. Nevertheless, existing BGS techniques utilizing deep learning still encounter challenges with various background noises in videos, including variations in lighting, shifts in camera angles, and disturbances like air turbulence or swaying trees. To address this problem, we design a spiking autoencoder network, termed SAEN-BGS, based on noise resilience and time-sequence sensitivity of spiking neural networks (SNNs) to enhance the separation of foreground and background. To eliminate unnecessary background noise and preserve the important foreground elements, we begin by creating the continuous spiking conv-and-dconv block, which serves as the fundamental building block for the decoder in SAEN-BGS. Moreover, in striving for enhanced energy efficiency, we introduce a novel self-distillation spiking supervised learning method grounded in ANN-to-SNN frameworks, resulting in decreased power consumption. In extensive experiments conducted on CDnet-2014 and DAVIS-2016 datasets, our approach demonstrates superior segmentation performance relative to other baseline methods, even when challenged by complex scenarios with dynamic backgrounds.