🤖 AI Summary
To address the stringent energy-efficiency and ultra-low-latency requirements of AI-enhanced O-RAN for B5G/6G, this paper proposes HeartStream—a 12nm FinFET-based shared-L1-memory cluster architecture integrating a 64-core RISC-V processor. Methodologically, HeartStream introduces three key innovations: (1) a custom complex-number arithmetic instruction set, (2) hardware-managed systolic queues, and (3) native SIMD support—enabling unified, efficient co-execution of baseband and AI workloads. Experimental results demonstrate that, under a strict 4-ms end-to-end latency constraint, HeartStream delivers 243 GFLOP/s baseband compute throughput and 72 GOP/s AI inference throughput, achieving 49.6 GFLOP/s/W energy efficiency at only 0.68 W power consumption. Crucially, it improves energy efficiency of critical baseband operators by 1.89×, thereby significantly alleviating the performance–power bottleneck in O-RAN edge nodes.
📝 Abstract
We present HeartStream, a 64-RV-core shared-L1-memory cluster (410 GFLOP/s peak performance and 204.8 GBps L1 bandwidth) for energy-efficient AI-enhanced O-RAN. The cores and cluster architecture are customized for baseband processing, supporting complex (16-bit real&imaginary) instructions: multiply&accumulate, division&square-root, SIMD instructions, and hardware-managed systolic queues, improving up to 1.89x the energy efficiency of key baseband kernels. At 800MHz@0.8V, HeartStream delivers up to 243GFLOP/s on complex-valued wireless workloads. Furthermore, the cores also support efficient AI processing on received data at up to 72 GOP/s. HeartStream is fully compatible with base station power and processing latency limits: it achieves leading-edge software-defined PUSCH efficiency (49.6GFLOP/s/W) and consumes just 0.68W (645MHz@0.65V), within the 4 ms end-to-end constraint for B5G/6G uplink.