🤖 AI Summary
To address high latency and low energy efficiency in FFT and SVD computations within AI models, this paper proposes a reconfigurable hardware acceleration architecture tailored for Xilinx FPGAs. The architecture introduces a novel integration of dataflow control, watermark embedding, and tightly coupled FFT/SVD compute units enabling dynamic reconfiguration. It employs pipelined CORDIC arithmetic, customized memory access patterns, and parallelized SVD decomposition to jointly optimize throughput, security, and robustness. Experimental evaluation demonstrates 12.3× and 9.6× speedups for FFT and SVD, respectively, over CPU/GPU software implementations, along with an 8.2× improvement in energy efficiency. This work establishes an efficient, secure, and scalable hardware acceleration paradigm for frequency-domain and matrix-decomposition-intensive workloads in AI systems.
📝 Abstract
This research introduces an FPGA-based hardware accelerator to optimize the Singular Value Decomposition (SVD) and Fast Fourier transform (FFT) operations in AI models. The proposed design aims to improve processing speed and reduce computational latency. Through experiments, we validate the performance benefits of the hardware accelerator and show how well it handles FFT and SVD operations. With its strong security and durability, the accelerator design achieves significant speedups over software implementations, thanks to its modules for data flow control, watermark embedding, FFT, and SVD.