🤖 AI Summary
This work addresses the energy-efficiency and latency bottlenecks in deploying deep neural networks on FPGAs, which stem from reliance on conventional multiply-accumulate operations, as well as the fragmented nature of existing approaches that hinder unified evaluation. To this end, the authors propose BitLogic—the first end-to-end trainable, FPGA-native neural network framework. Built around lookup tables (LUTs) as fundamental computational units, BitLogic eliminates multiply-accumulate operations entirely and enables gradient-driven training through differentiable LUT nodes, a hardware-aware output head, and a boundary-consistent LUT relaxation strategy. The framework further supports automatic RTL generation from PyTorch models to synthesizable HDL. Experiments demonstrate that BitLogic achieves 72.3% accuracy on CIFAR-10 using fewer than 0.3M logic gates, with per-sample inference latency under 20 nanoseconds, significantly advancing the accuracy-efficiency trade-off on FPGAs.
📝 Abstract
The energy and latency costs of deep neural network inference are increasingly driven by deployment rather than training, motivating hardware-specialized alternatives to arithmetic-heavy models. Field-Programmable Gate Arrays (FPGAs) provide an attractive substrate for such specialization, yet existing FPGA-based neural approaches are fragmented and difficult to compare. We present BitLogic, a fully gradient-based, end-to-end trainable framework for FPGA-native neural networks built around Lookup Table (LUT) computation. BitLogic replaces multiply-accumulate operations with differentiable LUT nodes that map directly to FPGA primitives, enabling native binary computation, sparse connectivity, and efficient hardware realization. The framework offers a modular functional API supporting diverse architectures, along with learned encoders, hardware-aware heads, and multiple boundary-consistent LUT relaxations. An automated Register Transfer Level (RTL) export pipeline translates trained PyTorch models into synthesizable HDL, ensuring equivalence between software and hardware inference. Experiments across standard vision benchmarks and heterogeneous hardware platforms demonstrate competitive accuracy and substantial gains in FPGA efficiency, including 72.3% test accuracy on CIFAR-10 achieved with fewer than 0.3M logic gates, while attaining sub-20 ns single-sample inference using only LUT resources.