A 33.6-136.2 TOPS/W Nonlinear Analog Computing-In-Memory Macro for Multi-bit LSTM Accelerator in 65 nm CMOS

📅 2025-12-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low energy efficiency of analog computing-in-memory (ACIM) for LSTM accelerators—caused by high nonlinearity and heavy reliance on digital implementations—this work proposes an analog compute-in-memory macro supporting multi-bit nonlinear operations. Our method introduces: (1) a reconfigurable 1–5-bit in-memory analog-to-digital converter (ADC), integrating dual-9T bitcells and decoupled read/write paths to support signed inputs and ternary weights; (2) an RUDC-cascaded structure to extend the read-bitline dynamic range; and (3) a replica-bias temperature-compensation technique synergized with a dual-supply 6T-SRAM array, reducing bitcell count by 7.8× and latency by 4×. Measurements show the 5-bit ADC achieves average error <1 LSB. On keyword spotting, the prototype delivers 92.0% on-chip accuracy, with 2.2× higher normalized energy efficiency and 1.6× improved area efficiency versus prior ACIM designs.

Technology Category

Application Category

📝 Abstract
The energy efficiency of analog computing-in-memory (ACIM) accelerator for recurrent neural networks, particularly long short-term memory (LSTM) network, is limited by the high proportion of nonlinear (NL) operations typically executed digitally. To address this, we propose an LSTM accelerator incorporating an ACIM macro with reconfigurable (1-5 bit) nonlinear in-memory (NLIM) analog-to-digital converter (ADC) to compute NL activations directly in the analog domain using: 1) a dual 9T bitcell with decoupled read/write paths for signed inputs and ternary weight operations; 2) a read-word-line underdrive Cascode (RUDC) technique achieving 2.8X higher read-bitline dynamic range than single-transistor designs (1.4X better over conventional Cascode structure with 7X lower current variation); 3) a dual-supply 6T-SRAM array for efficient multi-bit weight operations and reducing both bitcell count (7.8X) and latency (4X) for 5-bit weight operations. We experimentally demonstrate 5-bit NLIM ADC for approximating NL activations in LSTM cells, achieving average error <1 LSB. Simulation confirms the robustness of NLIM ADC against temperature variations thanks to the replica bias strategy. Our design achieves 92.0% on-chip inference accuracy for a 12-class keyword-spotting task while demonstrating 2.2X higher system-level normalized energy efficiency and 1.6X better normalized area efficiency than state-of-the-art works. The results combine physical measurements of a macro unit-accounting for the majority of LSTM operations (99% linear and 80% nonlinear operations)-with simulations of the remaining components, including additional LSTM and fully connected layers.
Problem

Research questions and friction points this paper is trying to address.

Improves energy efficiency of analog computing-in-memory for LSTM networks
Enables direct analog computation of nonlinear activations in memory
Reduces area and latency for multi-bit weight operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconfigurable NLIM ADC for analog nonlinear activation
RUDC technique enhances read-bitline dynamic range
Dual-supply SRAM reduces bitcell count and latency
🔎 Similar Papers
No similar papers found.
J
Junyi Yang
Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
X
Xinyu Luo
Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
Y
Ye Ke
Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
Z
Zheng Wang
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
H
Hongyang Shang
Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
S
Shuai Dong
Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
Z
Zhengnan Fu
Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
X
Xiaofeng Yang
Reexen Technology, Shenzhen 518000, China
H
Hongjie Liu
Reexen Technology, Shenzhen 518000, China
Arindam Basu
Arindam Basu
Professor, City University of Hong Kong (past Associate Professor of EEE at NTU)
NeuromorphicAnalog ICNeuromorphic EngineeringComputing-In-MemoryBrain-machine interface