🤖 AI Summary
To address the challenges of large die area, low energy efficiency, and limited throughput in existing RISC-V vector processors, this work designs and open-sources the first high-performance lane-based scalar-vector tightly coupled processor compliant with the RISC-V V 1.0 vector extension. We propose a PPA (Power-Performance-Area)-driven scalar-vector co-designed microarchitecture, innovatively integrating a customized floating-point unit (FPU) pipeline with an optimized register file organization to overcome throughput and energy-efficiency bottlenecks inherent in conventional RVV implementations. Experimental results demonstrate a 15% reduction in silicon area and a 6% improvement in instruction throughput compared to prior RVV vector engines. Moreover, the FPU utilization for key kernel functions reaches 98.5%, significantly enhancing processing efficiency for data-parallel workloads.
📝 Abstract
Vector architectures are gaining traction for highly efficient processing of data-parallel workloads, driven by all major ISAs (RISC-V, Arm, Intel), and boosted by landmark chips, like the Arm SVE-based Fujitsu A64FX, powering the TOP500 leader Fugaku. The RISC-V V extension has recently reached 1.0-Frozen status. Here, we present its first open-source implementation, discuss the new specification's impact on the micro-architecture of a lane-based design, and provide insights on performance-oriented design of coupled scalar-vector processors. Our system achieves comparable/better PPA than state-of-the-art vector engines that implement older RVV versions: 15% better area, 6% improved throughput, and FPU utilization >98.5% on crucial kernels.