Late Breaking Results: A RISC-V ISA Extension for Chaining in Scalar Processors

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Scalar in-order processors face performance bottlenecks under energy-efficiency constraints, where pipeline stalls due to data dependencies are highly sensitive, and conventional loop unrolling exacerbates register pressure. Method: This paper proposes Scalar Chaining Execution (SCE), a hardware-software co-design mechanism enabled by custom RISC-V instruction-set extensions. SCE automatically chains dependent instructions in hardware, enabling latency-hiding without software-level loop unrolling. Contribution/Results: By tightly integrating hardware execution support with compiler-aware scheduling, SCE maintains flexibility under strict register constraints while improving efficiency. Evaluated on stencil workloads, SCE achieves over 93% FPU utilization, delivering an average 4% performance gain and 10% energy-efficiency improvement over a highly optimized baseline. The complete design—including RTL, toolchain, and benchmarks—is open-source and experimentally reproducible.

Technology Category

Application Category

📝 Abstract

Modern general-purpose accelerators integrate a large number of programmable area- and energy-efficient processing elements (PEs), to deliver high performance while meeting stringent power delivery and thermal dissipation constraints. In this context, PEs are often implemented by scalar in-order cores, which are highly sensitive to pipeline stalls. Traditional software techniques, such as loop unrolling, mitigate the issue at the cost of increased register pressure, limiting flexibility. We propose scalar chaining, a novel hardware-software solution, to address this issue without incurring the drawbacks of traditional software-only techniques. We demonstrate our solution on register-limited stencil codes, achieving>93% FPU utilizations and a 4% speedup and 10% higher energy efficiency, on average, over highly-optimized baselines. Our implementation is fully open source and performance experiments are reproducible using free software.

Problem

Research questions and friction points this paper is trying to address.

Address pipeline stalls in scalar in-order cores

Reduce register pressure without loop unrolling

Improve FPU utilization and energy efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

RISC-V ISA Extension for scalar chaining

Hardware-software solution for pipeline stalls

Open-source implementation with high efficiency

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Software Engineer, Accelerators

OpenAI

$295K – $380K • Offers Equity

San Francisco

Software Engineer, Systems ML - Compilers / Backend