Design of A Low-Latency and Parallelizable SVD Dataflow Architecture on FPGA

📅 2025-11-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high latency, excessive resource consumption, and poor scalability on embedded platforms for real-time singular value decomposition (SVD) of large-scale data-stream matrices, this paper proposes a low-latency, FPGA-oriented data-stream architecture. The core innovation is a lightweight data-stream scheduling variant of the DSB Jacobi algorithm, achieved by restructuring the iterative flow and memory access patterns to significantly reduce on-chip Block RAM (BRAM) usage while enhancing parallelism. The architecture supports streaming input and pipelined computation, balancing computational efficiency with stringent hardware resource constraints. Experimental results demonstrate a 41.5% reduction in BRAM utilization and a 23× improvement in throughput over state-of-the-art approaches. Notably, it achieves, for the first time, real-time streaming SVD of thousand-order matrices on mid-range FPGAs—establishing an efficient hardware paradigm for high-dimensional signal analysis in edge intelligence applications.

Technology Category

Application Category

📝 Abstract
Singular value decomposition (SVD) is widely used for dimensionality reduction and noise suppression, and it plays a pivotal role in numerous scientific and engineering applications. As the dimensions of the matrix grow rapidly, the computational cost increases significantly, posing a serious challenge to the efficiency of data analysis and signal processing systems,especially in time-sensitive scenarios with large-scale datasets. Although various dedicated hardware architectures have been proposed to accelerate the computation of intensive SVD, many of these designs suffer from limited scalability and high consumption of on-chip memory resources. Moreover, they typically overlook the computational and data transfer challenges associated with SVD, enabling them unsuitable for real-time processing of large-scale data stream matrices in embedded systems. In this express, we propose a Data Stream-Based SVD processing algorithm (DSB Jacobi), which significantly reduces on-chip BRAM usage while improving computational speed, offering a practical solution for real-time SVD computation of large-scale data streams. Compared with previous works, our experimental results indicate that the proposed method reduces on-chip RAM consumption by 41.5 percent and improves computational efficiency by 23 times.
Problem

Research questions and friction points this paper is trying to address.

Accelerating SVD computation for large-scale matrices
Reducing on-chip memory consumption in FPGA designs
Enabling real-time processing of data stream matrices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-latency parallelizable SVD dataflow on FPGA
Data Stream-Based Jacobi algorithm for SVD
Reduces BRAM usage while boosting computation speed
🔎 Similar Papers
No similar papers found.
F
Fangqiang Du
East China Normal University, Shanghai 200241, China
S
Sixuan Chong
East China Normal University, Shanghai 200241, China
Z
Zixuan Huang
East China Normal University, Shanghai 200241, China
Rui Qin
Rui Qin
Tsighua University
F
Fengnan Mi
Shanghai Publishing and Printing College, Shanghai 200093, China
C
Caibao Hu
Department of Critical Care Medicine, Zhejiang Hospital, No. 12, Lingyin Road, Xihu District, Hangzhou, Zhejiang 310013, China
Jiangang Chen
Jiangang Chen
Research Assistant, University of Wisconsin-Madison