🤖 AI Summary
To address high latency, excessive resource consumption, and poor scalability on embedded platforms for real-time singular value decomposition (SVD) of large-scale data-stream matrices, this paper proposes a low-latency, FPGA-oriented data-stream architecture. The core innovation is a lightweight data-stream scheduling variant of the DSB Jacobi algorithm, achieved by restructuring the iterative flow and memory access patterns to significantly reduce on-chip Block RAM (BRAM) usage while enhancing parallelism. The architecture supports streaming input and pipelined computation, balancing computational efficiency with stringent hardware resource constraints. Experimental results demonstrate a 41.5% reduction in BRAM utilization and a 23× improvement in throughput over state-of-the-art approaches. Notably, it achieves, for the first time, real-time streaming SVD of thousand-order matrices on mid-range FPGAs—establishing an efficient hardware paradigm for high-dimensional signal analysis in edge intelligence applications.
📝 Abstract
Singular value decomposition (SVD) is widely used for dimensionality reduction and noise suppression, and it plays a pivotal role in numerous scientific and engineering applications. As the dimensions of the matrix grow rapidly, the computational cost increases significantly, posing a serious challenge to the efficiency of data analysis and signal processing systems,especially in time-sensitive scenarios with large-scale datasets. Although various dedicated hardware architectures have been proposed to accelerate the computation of intensive SVD, many of these designs suffer from limited scalability and high consumption of on-chip memory resources. Moreover, they typically overlook the computational and data transfer challenges associated with SVD, enabling them unsuitable for real-time processing of large-scale data stream matrices in embedded systems. In this express, we propose a Data Stream-Based SVD processing algorithm (DSB Jacobi), which significantly reduces on-chip BRAM usage while improving computational speed, offering a practical solution for real-time SVD computation of large-scale data streams. Compared with previous works, our experimental results indicate that the proposed method reduces on-chip RAM consumption by 41.5 percent and improves computational efficiency by 23 times.