🤖 AI Summary
Edge continual learning is hindered by the high energy consumption of conventional recurrent networks and frequent off-chip data transfers. This paper proposes M2RU, a memristor-based mixed-signal architecture tailored for edge deployment, which— for the first time—integrates weighted bit-stream computing with on-chip, hardware-level experience replay to enable low-power, real-time online temporal modeling and domain adaptation. Through crossbar optimization, device reliability modeling, and precision-enhancing encoding, M2RU effectively mitigates memristor non-idealities and significantly alleviates catastrophic forgetting. Experimental results demonstrate an energy efficiency of 312 GOPS/W—29× higher than state-of-the-art CMOS implementations—and maintain over 95% of software baseline accuracy on sequential MNIST and CIFAR-10 tasks. Projected device lifetime reaches 12.2 years.
📝 Abstract
Continual learning on edge platforms remains challenging because recurrent networks depend on energy-intensive training procedures and frequent data movement that are impractical for embedded deployments. This work introduces M2RU, a mixed-signal architecture that implements the minion recurrent unit for efficient temporal processing with on-chip continual learning. The architecture integrates weighted-bit streaming, which enables multi-bit digital inputs to be processed in crossbars without high-resolution conversion, and an experience replay mechanism that stabilizes learning under domain shifts. M2RU achieves 15 GOPS at 48.62 mW, corresponding to 312 GOPS per watt, and maintains accuracy within 5 percent of software baselines on sequential MNIST and CIFAR-10 tasks. Compared with a CMOS digital design, the accelerator provides 29X improvement in energy efficiency. Device-aware analysis shows an expected operational lifetime of 12.2 years under continual learning workloads. These results establish M2RU as a scalable and energy-efficient platform for real-time adaptation in edge-level temporal intelligence.