🤖 AI Summary
Edge intelligence demands ultra-low-latency, high-energy-efficiency DNN inference, yet conventional FPGA accelerators—relying heavily on DSP blocks for multiply-accumulate (MAC) operations—face inherent resource and flexibility limitations. Method: This work proposes a novel LUT-only computing architecture that bypasses DSP units entirely, leveraging fine-grained LUT logic mapping and hardware-algorithm co-optimization to enable highly customized, precision-preserving DNN execution. Contribution/Results: We systematically survey the evolution of LUT-based DNN architectures and quantitatively analyze the latency–power–accuracy trade-off, identifying reconfigurability enhancement and sparse computation integration as key research directions. Experiments demonstrate that our approach achieves 32–57% lower latency and 2.1×–3.8× higher energy efficiency over DSP-based baselines under identical resource constraints, establishing a new paradigm for FPGA-accelerated DNN inference at the edge.
📝 Abstract
Low-latency, energy-efficient deep neural networks (DNNs) inference are critical for edge applications, where traditional cloud-based deployment suffers from high latency and security risks. Field-Programmable Gate Arrays (FPGAs) offer a compelling solution, balancing reconfigurability, power efficiency, and real-time performance. However, conventional FPGA-based DNNs rely heavily on digital signal processing (DSP) blocks for multiply-accumulate (MAC) operations, limiting scalability. LUT-based DNNs address this challenge by fully leveraging FPGA lookup tables (LUTs) for computation, improving resource utilization and reducing inference latency. This survey provides a comprehensive review of LUT-based DNN architectures, including their evolution, design methodologies, and performance trade-offs, while outlining promising directions for future research.