Microarchitecture Design and Benchmarking of Custom SHA-3 Instruction for RISC-V

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of efficient hardware support for SHA-3 on RISC-V architectures, this work proposes integrating the Keccak permutation core as a custom instruction into a general-purpose CPU microarchitecture—overcoming challenges arising from its multi-stage computation, irregular memory access patterns, and high fan-in/fan-out. Our approach employs pipelined permutation execution, register-level optimizations, and hardware reuse of lookup tables (LUTs). The design is validated via GEM5 simulation and FPGA prototyping. Results demonstrate an 8.02× improvement in SHA-3 throughput and a 46.31× speedup for Keccak alone over software-only implementations, with only ~15% additional register overhead and 11.51% LUT overhead. This work presents the first low-overhead, high-throughput native SHA-3 instruction support for RISC-V, establishing a scalable instruction-set extension paradigm for post-quantum cryptographic acceleration.

Technology Category

Application Category

📝 Abstract
Integrating cryptographic accelerators into modern CPU architectures presents unique microarchitectural challenges, particularly when extending instruction sets with complex and multistage operations. Hardware-assisted cryptographic instructions, such as Intel's AES-NI and ARM's custom instructions for encryption workloads, have demonstrated substantial performance improvements. However, efficient SHA-3 acceleration remains an open problem due to its distinct permutation-based structure and memory access patterns. Existing solutions primarily rely on standalone coprocessors or software optimizations, often avoiding the complexities of direct microarchitectural integration. This study investigates the architectural challenges of embedding a SHA-3 permutation operation as a custom instruction within a general-purpose processor, focusing on pipelined simultaneous execution, storage utilization, and hardware cost. In this paper, we investigated and prototyped a SHA-3 custom instruction for the RISC-V CPU architecture. Using cycle-accurate GEM5 simulations and FPGA prototyping, our results demonstrate performance improvements of up to 8.02x for RISC-V optimized SHA-3 software workloads and up to 46.31x for Keccak-specific software workloads, with only a 15.09% increase in registers and a 11.51% increase in LUT utilization. These findings provide critical insights into the feasibility and impact of SHA-3 acceleration at the microarchitectural level, highlighting practical design considerations for future cryptographic instruction set extensions.
Problem

Research questions and friction points this paper is trying to address.

Efficient SHA-3 acceleration remains an open problem
Investigates architectural challenges of embedding SHA-3 instruction
Focuses on pipelined execution, storage utilization, hardware cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Custom SHA-3 instruction for RISC-V CPU
Pipelined simultaneous execution optimization
FPGA prototyping with GEM5 simulations
🔎 Similar Papers
No similar papers found.