🤖 AI Summary
Traditional Locally Competitive Algorithms (LCA) for Convolutional Sparse Coding (CSC) suffer from slow convergence, non-convex optimization induced by hard thresholding, and poor hardware efficiency. To address these issues, this paper proposes a hardware-cooperative LCA optimization framework. Our method introduces three key innovations: (1) a novel WARP-level parallel LCA solver that jointly accelerates sparse coding and dictionary update; (2) block-wise local response constraints and sliding-window sparse regularization to enhance biological plausibility and convergence stability; and (3) an adaptive dynamic thresholding mechanism to mitigate suboptimal solutions caused by hard thresholding. Evaluated on standard benchmarks including BSD500, our approach achieves a 5.3× speedup over conventional LCA, reduces memory footprint to one-quarter, improves reconstruction PSNR by 1.8 dB, and—critically—enables real-time video CSC for the first time.