AMD Versal Implementations of FAM and SSCA Estimators

📅 2025-06-22

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

To address the high computational complexity hindering real-time spectral correlation density (SCD) estimation on embedded platforms, this paper proposes a high-throughput, low-power cyclostationary signal analysis architecture based on the AMD Versal FPGA. Our method fully implements the Fast Fourier Transform Accumulation Method (FAM) in hardware on the Versal AI Engine array—the first such implementation—and designs an efficient Spectral Correlation Analyzer (SSCA) supporting window lengths up to $2^{20}$. The architecture tightly integrates FPGA-based hardware acceleration, AI Engine parallelism, and memory bandwidth optimization. Experimental results demonstrate that, compared to a 7 nm GPU fabricated in the same process node, our FAM and SSCA implementations achieve 4.43× and 1.90× higher throughput, respectively, along with 30.5× and 24.5× improvements in energy efficiency. These advances significantly enable real-time SCD estimation in resource-constrained embedded systems.

Technology Category

Application Category

📝 Abstract

Cyclostationary analysis is widely used in signal processing, particularly in the analysis of human-made signals, and spectral correlation density (SCD) is often used to characterise cyclostationarity. Unfortunately, for real-time applications, even utilising the fast Fourier transform (FFT), the high computational complexity associated with estimating the SCD limits its applicability. In this work, we present optimised, high-speed field-programmable gate array (FPGA) implementations of two SCD estimation techniques. Specifically, we present an implementation of the FFT accumulation method (FAM) running entirely on the AMD Versal AI engine (AIE) array. We also introduce an efficient implementation of the strip spectral correlation analyser (SSCA) that can be used for window sizes up to $2^{20}$. For both techniques, a generalised methodology is presented to parallelise the computation while respecting memory size and data bandwidth constraints. Compared to an NVIDIA GeForce RTX 3090 graphics processing unit (GPU) which uses a similar 7nm technology to our FPGA, for the same accuracy, our FAM/SSCA implementations achieve speedups of 4.43x/1.90x and a 30.5x/24.5x improvement in energy efficiency.

Problem

Research questions and friction points this paper is trying to address.

High computational complexity of SCD estimation in real-time signal processing

Need for optimized FPGA implementations of FAM and SSCA techniques

Challenges in parallelizing computations within memory and bandwidth constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized FPGA implementations for SCD estimation

FAM running on AMD Versal AI engine array

Efficient SSCA for large window sizes

🔎 Similar Papers

On Efficient Variants of Segment Anything Model: A Survey