🤖 AI Summary
This work addresses the limitations of lightweight dual-issue RISC-V cores, which are often hindered by programming model complexity and substantial software overhead. To overcome these challenges, the authors introduce a lightweight hardware queue into the in-order RISC-V core Snitch, enabling fine-grained communication and synchronization between integer and floating-point threads. They further propose COPIFTv2, an enhanced programming model that eliminates the tiling and software pipelining mechanisms of the original COPIFT, thereby significantly reducing both programming complexity and runtime overhead. Experimental results demonstrate that COPIFTv2 achieves up to 1.49× performance speedup and 1.47× energy efficiency improvement over the baseline, with a peak IPC of 1.81. The implementation is open-source, and all results are reproducible.
📝 Abstract
Large-scale ML accelerators rely on large numbers of PEs, imposing strict bounds on the area and energy budget of each PE. Prior work demonstrates that limited dual-issue capabilities can be efficiently integrated into a lightweight in-order open-source RISC-V core (Snitch), with a geomean IPC boost of 1.6x and a geomean energy efficiency gain of 1.3x, obtained by concurrently executing integer and FP instructions. Unfortunately, this required a complex and error-prone low level programming model (COPIFT). We introduce COPIFTv2 which augments Snitch with lightweight queues enabling direct, fine-grained communication and synchronization between integer and FP threads. By eliminating the tiling and software pipelining steps of COPIFT, we can remove much of its complexity and software overheads. As a result, COPIFTv2 achieves up to a 1.49x speedup and a 1.47x energy-efficiency gain over COPIFT, and a peak IPC of 1.81. Overall, COPIFTv2 significantly enhances the efficiency and programmability of dual-issue execution on lightweight cores. Our implementation is fully open source and performance experiments are reproducible using free software.