A 410GFLOP/s, 64 RISC-V Cores, 204.8GBps Shared-Memory Cluster in 12nm FinFET with Systolic Execution Support for Efficient B5G/6G AI-Enhanced O-RAN

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the stringent energy-efficiency and ultra-low-latency requirements of AI-enhanced O-RAN for B5G/6G, this paper proposes HeartStream—a 12nm FinFET-based shared-L1-memory cluster architecture integrating a 64-core RISC-V processor. Methodologically, HeartStream introduces three key innovations: (1) a custom complex-number arithmetic instruction set, (2) hardware-managed systolic queues, and (3) native SIMD support—enabling unified, efficient co-execution of baseband and AI workloads. Experimental results demonstrate that, under a strict 4-ms end-to-end latency constraint, HeartStream delivers 243 GFLOP/s baseband compute throughput and 72 GOP/s AI inference throughput, achieving 49.6 GFLOP/s/W energy efficiency at only 0.68 W power consumption. Crucially, it improves energy efficiency of critical baseband operators by 1.89×, thereby significantly alleviating the performance–power bottleneck in O-RAN edge nodes.

Technology Category

Application Category

📝 Abstract
We present HeartStream, a 64-RV-core shared-L1-memory cluster (410 GFLOP/s peak performance and 204.8 GBps L1 bandwidth) for energy-efficient AI-enhanced O-RAN. The cores and cluster architecture are customized for baseband processing, supporting complex (16-bit real&imaginary) instructions: multiply&accumulate, division&square-root, SIMD instructions, and hardware-managed systolic queues, improving up to 1.89x the energy efficiency of key baseband kernels. At 800MHz@0.8V, HeartStream delivers up to 243GFLOP/s on complex-valued wireless workloads. Furthermore, the cores also support efficient AI processing on received data at up to 72 GOP/s. HeartStream is fully compatible with base station power and processing latency limits: it achieves leading-edge software-defined PUSCH efficiency (49.6GFLOP/s/W) and consumes just 0.68W (645MHz@0.65V), within the 4 ms end-to-end constraint for B5G/6G uplink.
Problem

Research questions and friction points this paper is trying to address.

Designing energy-efficient AI-enhanced O-RAN baseband processors
Optimizing systolic execution for B5G/6G wireless workloads
Achieving power and latency constraints for base stations
Innovation

Methods, ideas, or system contributions that make the work stand out.

64 RISC-V cores with shared L1 memory
Systolic execution support for AI-enhanced O-RAN
Complex-valued instructions for baseband processing
🔎 Similar Papers
No similar papers found.