🤖 AI Summary
To address communication latency, security vulnerabilities, high energy consumption, the von Neumann bottleneck, and analog-to-digital/time-frequency conversion overhead in time-dependent signal processing for edge computing, this work proposes a fully analog, end-to-end, time-domain speech recognition architecture. We present the first heterogeneous integration of doped-network processing units (DNPUs) and analog in-memory computing (AIMC) chips: DNPUs emulate cochlear dynamics at the material level to extract temporal features directly in the analog domain, while AIMC chips perform ultra-low-power classification entirely in analog. This design circumvents architectural constraints and conversion losses inherent in conventional digital processors. Evaluated on the TI-46-Word benchmark, the system achieves 96.2% accuracy, with DNPU power dissipation of only 100 nW and a single multiply-accumulate (MAC) operation consuming less than 10 fJ—marking substantial improvements in energy efficiency and real-time performance for edge intelligence.
📝 Abstract
With the rise of decentralized computing, as in the Internet of Things, autonomous driving, and personalized healthcare, it is increasingly important to process time-dependent signals at the edge efficiently: right at the place where the temporal data are collected, avoiding time-consuming, insecure, and costly communication with a centralized computing facility (or cloud). However, modern-day processors often cannot meet the restrained power and time budgets of edge systems because of intrinsic limitations imposed by their architecture (von Neumann bottleneck) or domain conversions (analogue-to-digital and time-to-frequency). Here, we propose an edge temporal-signal processor based on two in-materia computing systems for both feature extraction and classification, reaching a software-level accuracy of 96.2% for the TI-46-Word speech-recognition task. First, a nonlinear, room-temperature dopant-network-processing-unit (DNPU) layer realizes analogue, time-domain feature extraction from the raw audio signals, similar to the human cochlea. Second, an analogue in-memory computing (AIMC) chip, consisting of memristive crossbar arrays, implements a compact neural network trained on the extracted features for classification. With the DNPU feature extraction consuming 100s nW and AIMC-based classification having the potential for less than 10 fJ per multiply-accumulate operation, our findings offer a promising avenue for advancing the compactness, efficiency, and performance of heterogeneous smart edge processors through in-materia computing hardware.