Energy-Efficient NTT Sampler for Kyber Benchmarked on FPGA

📅 2025-05-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high energy consumption and latency of matrix sampling during key generation in Kyber on low-power embedded systems (e.g., IoT devices, TPMs, smart cards), this work proposes Modified SampleNTT. It achieves polynomial sampling with only two SHAKE-128 squeezes in 99.16% of cases—reducing random bit consumption by 33%—while preserving the original rejection rate and statistical randomness. Leveraging an Artix-7 FPGA implementation, we optimize the NTT architecture, streamline the rejection logic, and adopt a lightweight SHAKE-128 invocation strategy. Experimental results show that Modified SampleNTT reduces energy consumption by 33.14% and latency by 33.32% compared to conventional SampleNTT, with a marginal 0.28% reduction in logic resource usage. These improvements significantly enhance Kyber’s energy efficiency and practicality in resource-constrained environments.

Technology Category

Application Category

📝 Abstract
Kyber is a lattice-based key encapsulation mechanism selected for standardization by the NIST Post-Quantum Cryptography (PQC) project. A critical component of Kyber's key generation process is the sampling of matrix elements from a uniform distribution over the ring Rq . This step is one of the most computationally intensive tasks in the scheme, significantly impacting performance in low-power embedded systems such as Internet of Things (IoT), wearable devices, wireless sensor networks (WSNs), smart cards, TPMs (Trusted Platform Modules), etc. Existing approaches to this sampling, notably conventional SampleNTT and Parse-SPDM3, rely on rejection sampling. Both algorithms require a large number of random bytes, which needs at least three SHAKE-128 squeezing steps per polynomial. As a result, it causes significant amount of latency and energy. In this work, we propose a novel and efficient sampling algorithm, namely Modified SampleNTT, which substantially educes the average number of bits required from SHAKE-128 to generate elements in Rq - achieving approximately a 33% reduction compared to conventional SampleNTT. Modified SampleNTT achieves 99.16% success in generating a complete polynomial using only two SHAKE-128 squeezes, outperforming both state-of-the-art methods, which never succeed in two squeezes of SHAKE-128. Furthermore, our algorithm maintains the same average rejection rate as existing techniques and passes all standard statistical tests for randomness quality. FPGA implementation on Artix-7 demonstrates a 33.14% reduction in energy, 33.32% lower latency, and 0.28% fewer slices compared to SampleNTT. Our results confirm that Modified SampleNTT is an efficient and practical alternative for uniform polynomial sampling in PQC schemes such as Kyber, especially for low-power security processors.
Problem

Research questions and friction points this paper is trying to address.

Reducing energy and latency in Kyber's matrix sampling
Optimizing SHAKE-128 usage for polynomial generation
Improving efficiency for low-power PQC systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modified SampleNTT reduces SHAKE-128 bits usage
Achieves 99.16% success with two SHAKE-128 squeezes
FPGA shows 33% energy and latency reduction
🔎 Similar Papers
P
P. Baidya
Department of Mathematics, National Institute of Technology, Jamshedpur, India; Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan Deemed to be University, Bhubaneswar, India
Rourab Paul
Rourab Paul
Shiv Nadar University Chennai
Reconfigurable HardwarePost Quantum CryptographyEmbedded SystemMulti-CoreBlockchain
V
Vikas Srivastava
Department of Mathematics, Indian Institute of Technology Madras, Chennai, India
S
S. Debnath
Department of Mathematics, National Institute of Technology, Jamshedpur, India