🤖 AI Summary
Existing analog circuits suffer from noise accumulation due to temporal feedback, hindering their ability to support always-on, ultra-low-power recurrent neural networks (RNNs). This work proposes a hardware-software co-designed Bistable Memory Recurrent Unit (BMRU), whose discrete outputs and hysteresis-based dynamics effectively suppress noise buildup while enabling a one-to-one mapping of parameters to a current-mode analog circuit. Transistor-level simulations in 180 nm CMOS demonstrate excellent alignment between hardware and software behaviors. In an end-to-end keyword spotting task, the RNN core achieves sub-microwatt inference power, with recurrent-stage power consumption scaling linearly with state dimensionality—yielding over 20× improvement in energy efficiency compared to conventional approaches. This represents the first scalable, high-fidelity ultra-low-power analog RNN implementation.
📝 Abstract
Always-on AI applications, from environmental sensors to biomedical implants, require ultra-low power consumption. Analog circuits offer a path to sub-microwatt inference, yet existing analog implementations are limited to feedforward architectures: extending them to recurrent dynamics has been considered impractical due to noise accumulation through temporal feedback. We demonstrate that this barrier can be overcome through hardware-software co-design. Specifically, we identify that Bistable Memory Recurrent Units (BMRUs), a class of Recurrent Neural Networks (RNNs) with discrete-valued outputs and hysteretic dynamics, admit an ultra-low power current-mode analog implementation which we design from first principles. The resulting circuit establishes a one-to-one correspondence between each learned parameter and a circuit element. The discrete outputs suppress analog noise by at least 20-fold at each cell boundary, breaking the noise accumulation that prevents analog recurrence. We reformulate BMRUs for first-quadrant operation with fixed thresholds, enabling the direct correspondence while preserving expressivity and trainability. Transistor-level simulations in 180 nm Complementary Metal-Oxide-Semiconductor (CMOS) show near-perfect agreement between software predictions and circuit-level behavior, with the software model thereby serving as a high-fidelity simulator of the physical hardware at low computational cost. We leverage this fidelity to conduct large-scale noise immunity and power scaling analyses: the power cost of adding recurrence scales linearly with state dimension, while the feedforward layers dominating total power scale quadratically, meaning recurrence is added at linear marginal cost relative to the feedforward backbone. End-to-end keyword spotting achieves sub-microwatt inference at the RNN core.