AMD Versal Implementations of FAM and SSCA Estimators

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational complexity hindering real-time spectral correlation density (SCD) estimation on embedded platforms, this paper proposes a high-throughput, low-power cyclostationary signal analysis architecture based on the AMD Versal FPGA. Our method fully implements the Fast Fourier Transform Accumulation Method (FAM) in hardware on the Versal AI Engine array—the first such implementation—and designs an efficient Spectral Correlation Analyzer (SSCA) supporting window lengths up to $2^{20}$. The architecture tightly integrates FPGA-based hardware acceleration, AI Engine parallelism, and memory bandwidth optimization. Experimental results demonstrate that, compared to a 7 nm GPU fabricated in the same process node, our FAM and SSCA implementations achieve 4.43× and 1.90× higher throughput, respectively, along with 30.5× and 24.5× improvements in energy efficiency. These advances significantly enable real-time SCD estimation in resource-constrained embedded systems.

Technology Category

Application Category

📝 Abstract
Cyclostationary analysis is widely used in signal processing, particularly in the analysis of human-made signals, and spectral correlation density (SCD) is often used to characterise cyclostationarity. Unfortunately, for real-time applications, even utilising the fast Fourier transform (FFT), the high computational complexity associated with estimating the SCD limits its applicability. In this work, we present optimised, high-speed field-programmable gate array (FPGA) implementations of two SCD estimation techniques. Specifically, we present an implementation of the FFT accumulation method (FAM) running entirely on the AMD Versal AI engine (AIE) array. We also introduce an efficient implementation of the strip spectral correlation analyser (SSCA) that can be used for window sizes up to $2^{20}$. For both techniques, a generalised methodology is presented to parallelise the computation while respecting memory size and data bandwidth constraints. Compared to an NVIDIA GeForce RTX 3090 graphics processing unit (GPU) which uses a similar 7nm technology to our FPGA, for the same accuracy, our FAM/SSCA implementations achieve speedups of 4.43x/1.90x and a 30.5x/24.5x improvement in energy efficiency.
Problem

Research questions and friction points this paper is trying to address.

High computational complexity of SCD estimation in real-time signal processing
Need for optimized FPGA implementations of FAM and SSCA techniques
Challenges in parallelizing computations within memory and bandwidth constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized FPGA implementations for SCD estimation
FAM running on AMD Versal AI engine array
Efficient SSCA for large window sizes
🔎 Similar Papers
No similar papers found.
C
Carol Jingyi Li
Computer Engineering Lab, The University of Sydney, NSW, Australia; Reconfigurable Computing Systems Lab , The Hong Kong University of Science and Technology, Hong Kong
R
Ruilin Wu
Computer Engineering Lab, The University of Sydney, NSW, Australia
Philip H.W. Leong
Philip H.W. Leong
Professor of Computer Systems, The University of Sydney
reconfigurable computingFPGAssignal processingembedded systemsmachine learning