🤖 AI Summary
This work addresses the high computational cost of implementing the Sigmoid activation function on edge FPGA devices by proposing a high-precision, low-resource hardware acceleration scheme. Leveraging the mathematical equivalence between the Sigmoid and hyperbolic tangent functions, the design integrates normalized inputs with an improved mixed-radix hyperbolic rotation CORDIC (MR-HRC) algorithm. It employs a hybrid radix-2/radix-4 iterative strategy and a fast-convergence mechanism that eliminates the need for scale-factor compensation, further enhanced by a radix-2 linear vectoring CORDIC stage. Implemented with 16-bit fixed-point arithmetic and a fully pipelined architecture on a Xilinx Virtex-7 FPGA, the solution consumes only 835 LUTs and zero DSP slices while achieving an average absolute error of 4.23×10⁻⁴, significantly outperforming existing approaches.
📝 Abstract
Efficient hardware implementation of nonlinear activation functions is a crucial task in deploying artificial neural networks on resource-constrained and edge devices such as Field-Programmable Gate Arrays (FPGAs). The sigmoid activation function is widely used for probabilistic output, binary classification, and gating mechanisms in recurrent neural networks, despite its reliance on exponential computations.
This paper presents a hardware-efficient FPGA implementation of the sigmoid activation function using a mixed-radix CORDIC-based architecture. The proposed approach leverages the mathematical relationship between the sigmoid and hyperbolic tangent functions. The input range is normalized to 1, enabling the corresponding tanh computation to operate within a reduced range of 0.5, which significantly improves convergence behavior.
To achieve high accuracy with minimal hardware overhead, a modified mixed-radix hyperbolic rotation CORDIC (MR-HRC) algorithm combining radix-2 and radix-4 iterations is introduced. The initial radix-2 stage ensures stable convergence, while the subsequent radix-4 stage accelerates convergence without requiring scale-factor compensation. In the final stage, a radix-2 linear vectoring CORDIC (R2-LVC) is used to compute the hyperbolic tangent by dividing hyperbolic sine and cosine values derived from the MR-HRC algorithm.
The entire architecture is fully pipelined and implemented on an FPGA. The design is realized on an Xilinx Virtex-7 FPGA using a 16-bit fixed-point representation. Experimental results demonstrate a significant reduction in hardware utilization, requiring only 835 logic slices with zero DSP usage. Additionally, the design achieves a mean absolute error of 4.23 10^-4, outperforming several recent sigmoid implementations.