Bi-SamplerZ: A Hardware-Efficient Gaussian Sampler Architecture for Quantum-Resistant Falcon Signatures

πŸ“… 2025-05-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
The core bottleneck of the FALCON post-quantum signature scheme lies in its computationally intensive discrete Gaussian sampling (DGS). To address this, we propose Bi-SamplerZ, a fully hardware-accelerated dual-path DGS architecture. Bi-SamplerZ introduces a novel cooperative dual-data-path design that synergistically exploits both the paired invocation pattern inherent to SamplerZ and the statistical correlation among rejection-sampling trials. It achieves high efficiency via fine-grained pipelining, dynamic control coordination, and tight ASIC/FPGA co-design. Compared to state-of-the-art implementations, Bi-SamplerZ reduces the per-sample latency by 54.1% and achieves the best area–time product (ATP). It delivers the lowest sampling latency on both FPGA and ASIC platforms, establishing a new hardware acceleration benchmark for FALCON.

Technology Category

Application Category

πŸ“ Abstract
FALCON is a standardized quantum-resistant digital signature scheme that offers advantages over other schemes, but features more complex signature generation process. This paper presents Bi-Samplerz, a fully hardware-implemented, high-efficiency dual-path discrete Gaussian sampler designed to accelerate Falcon signature generation. Observing that the SamplerZ subroutine is consistently invoked in pairs during each signature generation, we propose a dual-datapath architecture capable of generating two sampling results simultaneously. To make the best use of coefficient correlation and the inherent properties of rejection sampling, we introduce an assistance mechanism that enables effective collaboration between the two datapaths, rather than simply duplicating the sampling process. Additionally, we incorporate several architectural optimizations over existing designs to further enhance speed, area efficiency, and resource utilization. Experimental results demonstrate that Bi-SamplerZ achieves the lowest sampling latency to date among existing designs, benefiting from fine-grained pipeline optimization and efficient control coordination. Compared with the state-of-the-art full hardware implementations, Bi-SamplerZ reduces the sampling cycle count by 54.1% while incurring only a moderate increase in hardware resource consumption, thereby achieving the best-known area-time product (ATP) for fully hardware-based sampler designs. In addition, to facilitate comparison with existing works, we provide both ASIC and FPGA implementations. Together, these results highlight the suitability of Bi-SamplerZ as a high-performance sampling engine in standardized post-quantum cryptographic systems such as Falcon.
Problem

Research questions and friction points this paper is trying to address.

Accelerates Falcon signature generation via hardware-efficient Gaussian sampler
Reduces sampling latency by 54.1% with dual-datapath architecture
Optimizes area-time product for post-quantum cryptographic systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-path architecture for simultaneous sampling
Assistance mechanism for datapath collaboration
Fine-grained pipeline optimization for speed
πŸ”Ž Similar Papers
No similar papers found.
B
Binke Zhao
School of Integrated Circuits, University of Electronic Science and Technology of China, and was an exchange student at the System on Chip Center, Department of Computer and Information Engineering, Khalifa University, UAE
G
Ghada Alsuhi
Department of Electrical and Computer Engineering, System on Chip Center, Khalifa University, UAE
H
H. Saleh
Department of Electrical and Computer Engineering, System on Chip Center, Khalifa University, UAE
Baker Mohammad
Baker Mohammad
Computer and Information Engineering, Khalifa University
Energy-efficient computingHardware acceleratorsNeuromorphic computingMemristor