🤖 AI Summary
To address the high energy consumption and latency of matrix sampling during key generation in Kyber on low-power embedded systems (e.g., IoT devices, TPMs, smart cards), this work proposes Modified SampleNTT. It achieves polynomial sampling with only two SHAKE-128 squeezes in 99.16% of cases—reducing random bit consumption by 33%—while preserving the original rejection rate and statistical randomness. Leveraging an Artix-7 FPGA implementation, we optimize the NTT architecture, streamline the rejection logic, and adopt a lightweight SHAKE-128 invocation strategy. Experimental results show that Modified SampleNTT reduces energy consumption by 33.14% and latency by 33.32% compared to conventional SampleNTT, with a marginal 0.28% reduction in logic resource usage. These improvements significantly enhance Kyber’s energy efficiency and practicality in resource-constrained environments.
📝 Abstract
Kyber is a lattice-based key encapsulation mechanism selected for standardization by the NIST Post-Quantum Cryptography (PQC) project. A critical component of Kyber's key generation process is the sampling of matrix elements from a uniform distribution over the ring Rq . This step is one of the most computationally intensive tasks in the scheme, significantly impacting performance in low-power embedded systems such as Internet of Things (IoT), wearable devices, wireless sensor networks (WSNs), smart cards, TPMs (Trusted Platform Modules), etc. Existing approaches to this sampling, notably conventional SampleNTT and Parse-SPDM3, rely on rejection sampling. Both algorithms require a large number of random bytes, which needs at least three SHAKE-128 squeezing steps per polynomial. As a result, it causes significant amount of latency and energy. In this work, we propose a novel and efficient sampling algorithm, namely Modified SampleNTT, which substantially educes the average number of bits required from SHAKE-128 to generate elements in Rq - achieving approximately a 33% reduction compared to conventional SampleNTT. Modified SampleNTT achieves 99.16% success in generating a complete polynomial using only two SHAKE-128 squeezes, outperforming both state-of-the-art methods, which never succeed in two squeezes of SHAKE-128. Furthermore, our algorithm maintains the same average rejection rate as existing techniques and passes all standard statistical tests for randomness quality. FPGA implementation on Artix-7 demonstrates a 33.14% reduction in energy, 33.32% lower latency, and 0.28% fewer slices compared to SampleNTT. Our results confirm that Modified SampleNTT is an efficient and practical alternative for uniform polynomial sampling in PQC schemes such as Kyber, especially for low-power security processors.