🤖 AI Summary
To address the energy efficiency and speed bottlenecks of matrix-vector multiplication (MVM) in neural networks, this work proposes a superconducting compute-in-memory architecture based on optimized bistable vortex memory (BVM) arrays. The method introduces a novel oblique-sensing-line layout and a multiplication-oriented BVM array structure, integrated with current-domain accumulation, single-flux-quantum (SFQ) pulse encoding, quantized buffering, and a T1-based adder to realize a tiled, pulse-domain in-memory multiplier. The fabricated 4-bit multiplier achieves a clock frequency of 20 GHz and an ultra-low latency of 50 ps, and successfully demonstrates scalable MVM operations at 20 GHz. This architecture significantly improves computational throughput and energy efficiency—delivering >10× higher throughput per watt compared to state-of-the-art CMOS and emerging non-volatile memory accelerators—thereby establishing a promising pathway toward ultra-high-speed, low-power neuromorphic hardware.
📝 Abstract
Building upon previously introduced Bistable Vortex Memory (BVM) as a novel, nonvolatile, high-density, and scalable superconductor memory technology, this work presents a methodology that uses BVM arrays to address challenges in data-driven algorithms and neural networks, specifically focusing on matrix-vector multiplication (MVM). The BVM approach introduces a novel superconductor-based methodology for in-memory arithmetic, achieving ultra-high-speed and energy-efficient computation by utilizing BVM arrays for in-memory computation. The design employs a tiled multiplier structure where BVM's inherent current summation capability is combined with Quantizer Buffer (QB) cells to convert the analog accumulated current into a variable number of digital Single Flux Quantum (SFQ) pulses. These pulses are then processed by T1 adder cells, which handle binary addition and carry propagation, thereby forming a complete functional multiplier unit. This paper thus presents an efficient MVM architecture that uses these BVM-based multipliers in a systolic array configuration to enable parallel computation. A key innovation is an optimized BVM array structure specifically tailored for multiplication applications, involving a restructuring of Sense Lines (SLs) with diagonal connections to reduce area and an adjusted input scheme to enhance computational efficiency compared to the general-purpose BVM array design. We demonstrate the efficacy of this approach with a 4-bit multiplier operating at 20 GHz with 50 ps latency and an MVM structure demonstrating operation at 20 GHz. Furthermore, we showcase how this multiplier design can be extended to support Multiply-Accumulate (MAC) operations. This work paves the way for power-efficient neural networks by enabling high-speed in-memory computation.