🤖 AI Summary
RISC-V Vector Extension (RVV) suffers from static vector register count and fixed power-of-two grouping, necessitating strip-mining for long vectors and degrading parallel efficiency. This paper proposes Zoozve—a novel RISC-V vector extension eliminating strip-mining by supporting arbitrary vector register counts/lengths and dynamic, fine-grained grouping for data-adaptive alignment. Its core innovation lies in overcoming RVV’s static architectural constraints via a configurable vector register file, dynamic register allocation mechanism, and LLVM compiler integration. Evaluated on FFT benchmarks, Zoozve achieves a 10.10× reduction in dynamic instruction count with only a 5.2% hardware area overhead. This design significantly enhances parallelism and energy efficiency for long-vector computations while maintaining full backward compatibility with existing RVV software stacks.
📝 Abstract
Vector processing is crucial for boosting processor performance and efficiency, particularly with data-parallel tasks. The RISC-V"V"Vector Extension (RVV) enhances algorithm efficiency by supporting vector registers of dynamic sizes and their grouping. Nevertheless, for very long vectors, the static number of RVV vector registers and its power-of-two grouping can lead to performance restrictions. To counteract this limitation, this work introduces Zoozve, a RISC-V vector instruction extension that eliminates the need for strip-mining. Zoozve allows for flexible vector register length and count configurations to boost data computation parallelism. With a data-adaptive register allocation approach, Zoozve permits any register groupings and accurately aligns vector lengths, cutting down register overhead and alleviating performance declines from strip-mining. Additionally, the paper details Zoozve's compiler and hardware implementations using LLVM and SystemVerilog. Initial results indicate Zoozve yields a minimum 10.10$ imes$ reduction in dynamic instruction count for fast Fourier transform (FFT), with a mere 5.2% increase in overall silicon area.