🤖 AI Summary
To address the high timestep overhead, substantial transmission costs, and excessive energy consumption in edge–cloud collaborative spiking neural network (SNN) inference, this paper proposes a brain-inspired co-inference architecture that jointly optimizes spatiotemporal redundancy. Our approach features: (1) a learning-driven spike compression module that adaptively reduces temporal redundancy; (2) a dynamic early-exit mechanism enabling edge-side termination of low-confidence inferences; and (3) a lightweight edge–cloud collaboration framework supporting dual-modal inputs—static images and event streams. Evaluated on real hardware platforms, the proposed method achieves up to 2048× reduction in communication volume, over 90% reduction in edge energy consumption, and a 3× decrease in end-to-end latency, while maintaining accuracy degradation below 2% compared to baseline methods.
📝 Abstract
Spiking Neural Networks (SNNs) offer significant potential for enabling energy-efficient intelligence at the edge. However, performing full SNN inference at the edge can be challenging due to the latency and energy constraints arising from fixed and high timestep overheads. Edge-cloud co-inference systems present a promising solution, but their deployment is often hindered by high latency and feature transmission costs. To address these issues, we introduce NeuCODEX, a neuromorphic co-inference architecture that jointly optimizes both spatial and temporal redundancy. NeuCODEX incorporates a learned spike-driven compression module to reduce data transmission and employs a dynamic early-exit mechanism to adaptively terminate inference based on output confidence. We evaluated NeuCODEX on both static images (CIFAR10 and Caltech) and neuromorphic event streams (CIFAR10-DVS and N-Caltech). To demonstrate practicality, we prototyped NeuCODEX on ResNet-18 and VGG-16 backbones in a real edge-to-cloud testbed. Our proposed system reduces data transfer by up to 2048x and edge energy consumption by over 90%, while reducing end-to-end latency by up to 3x compared to edge-only inference, all with a negligible accuracy drop of less than 2%. In doing so, NeuCODEX enables practical, high-performance SNN deployment in resource-constrained environments.