๐ค AI Summary
RISC-V GPUs lack native hardware support for warp-level primitives essential to modern GPU programming. To address this, this work pioneers a holistic hardwareโsoftware co-design approach: (1) a custom hardware warp scheduler is designed to accelerate warp-level operations, and (2) a lightweight software emulation layer is implemented in the LLVM compiler infrastructure. Evaluated on the open-source Vortex GPU architecture, both approaches are rigorously compared across performance, silicon area, and software compatibility. The hardware solution achieves up to 4ร geomean IPC improvement on microbenchmarks; the software solution delivers full functional coverage while significantly reducing area overhead. This study establishes the practical feasibility and scalability limits of warp-level abstractions in RISC-V GPU designs, providing foundational insights and empirical guidance for developing high-performance, open-source GPU architectures supporting modern parallel programming models.
๐ Abstract
RISC-V GPUs present a promising path for supporting GPU applications. Traditionally, GPUs achieve high efficiency through the SPMD (Single Program Multiple Data) programming model. However, modern GPU programming increasingly relies on warp-level features, which diverge from the conventional SPMD paradigm. In this paper, we explore how RISC-V GPUs can support these warp-level features both through hardware implementation and via software-only approaches. Our evaluation shows that a hardware implementation achieves up to 4 times geomean IPC speedup in microbenchmarks, while software-based solutions provide a viable alternative for area-constrained scenarios.