Unlimited Vector Processing for Wireless Baseband Based on RISC-V Extension

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Conventional vector architectures for wireless baseband processing suffer from limited register capacity, rigid power-of-two vector lengths, and inflexible permutation support. To address these limitations, this paper proposes the Unbounded Vector Processing (UVP) architecture, a RISC-V extension. Methodologically, UVP introduces a novel programming model supporting non-power-of-two register grouping and hardware-automated strip-mining; defines symmetric and asymmetric vector instruction classes with customized memory-access strategies; and integrates a highly robust permutation engine alongside a fixed-point-optimized pipeline. Implemented in SMIC 40 nm CMOS, the RTL prototype demonstrates 3.0× and 2.1× speedup over lane-based architectures for matrix multiplication and FFT, respectively. Under a 16-lane configuration, the design occupies only 0.94 mm² and achieves an energy efficiency of 21.2 GOPS/mm².

Technology Category

Application Category

📝 Abstract

Wireless baseband processing (WBP) serves as an ideal scenario for utilizing vector processing, which excels in managing data-parallel operations due to its parallel structure. However, conventional vector architectures face certain constraints such as limited vector register sizes, reliance on power-of-two vector length multipliers, and vector permutation capabilities tied to specific architectures. To address these challenges, we have introduced an instruction set extension (ISE) based on RISC-V known as unlimited vector processing (UVP). This extension enhances both the flexibility and efficiency of vector computations. UVP employs a novel programming model that supports non-power-of-two register groupings and hardware strip-mining, thus enabling smooth handling of vectors of varying lengths while reducing the software strip-mining burden. Vector instructions are categorized into symmetric and asymmetric classes, complemented by specialized load/store strategies to optimize execution. Moreover, we present a hardware implementation of UVP featuring sophisticated hazard detection mechanisms, optimized pipelines for symmetric tasks such as fixed-point multiplication and division, and a robust permutation engine for effective asymmetric operations. Comprehensive evaluations demonstrate that UVP significantly enhances performance, achieving up to 3.0$ imes$ and 2.1$ imes$ speedups in matrix multiplication and fast Fourier transform (FFT) tasks, respectively, when measured against lane-based vector architectures. Our synthesized RTL for a 16-lane configuration using SMIC 40nm technology spans 0.94 mm$^2$ and achieves an area efficiency of 21.2 GOPS/mm$^2$.

Problem

Research questions and friction points this paper is trying to address.

Enhancing flexibility in wireless baseband vector processing

Overcoming limitations of conventional vector architectures

Improving efficiency with RISC-V based UVP extension

Innovation

Methods, ideas, or system contributions that make the work stand out.

RISC-V extension enables unlimited vector processing

Novel programming model supports flexible vector lengths

Optimized hardware with hazard detection and pipelines

🔎 Similar Papers

A Hierarchical Dataflow-Driven Heterogeneous Architecture for Wireless Baseband Processing