🤖 AI Summary
This work addresses the need for efficient, privacy-preserving, and low-latency continual learning of human actions on edge devices in applications such as AR/VR and robotics. It presents the first event-driven continual learning system deployed on the Intel Loihi 2 neuromorphic chip. The system integrates a spiking 2D CNN for spatiotemporal feature extraction with an enhanced CLP-SNN learning head, alongside a Loihi 2–optimized temporal aggregation layer and fixed-point normalization layer to enable efficient online learning. Evaluated on the THU E-ACT-50 dataset, the system achieves a continual learning accuracy of 70.4%, outperforming an edge GPU baseline by over two orders of magnitude in energy efficiency and reducing latency by 16×. This marks the first demonstration of event-based action continual learning on neuromorphic hardware.
📝 Abstract
Recognizing and continuously learning novel human actions without forgetting prior classes is a requirement for emerging AR/VR and robotics applications. For these applications, both on-device processing and learning are essential for privacy and low-latency adaptation. Event cameras address the efficiency of visual sensing with sparse, asynchronous output that is naturally compatible with neuromorphic processing. Yet no prior system has deployed a continual on-device learning pipeline for event-based action recognition using neuromorphic hardware. We present CLANE, Continual Learning of Actions on Neuromorphic Hardware from Event Cameras, deployed end-to-end on Intel Loihi 2. CLANE combines a spiking 2D CNN for spatiotemporal feature extraction with CLP-SNN as its on-chip learning head, extended to action clips via a Temporal Aggregation Layer and a fixed-point Normalization Layer, both novel Loihi 2 modules. On THU E-ACT-50, a 50-class dataset captured under real-world conditions, CLANE achieves 70.4% accuracy in a continual learning task while delivering more than 100x energy reduction and 16x lower latency over a sequential CNN+GRU+CLP edge GPU baseline, validated through iso-algorithm cross-platform benchmarking across three evaluation levels.