LL-ViT: Edge Deployable Vision Transformers with Look Up Table Neurons

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational overhead, memory footprint, and energy consumption of Vision Transformers (ViTs) on edge FPGA platforms, this paper proposes LL-ViT, a lightweight ViT architecture. LL-ViT natively integrates learnable Lookup Table (LUT)-based neurons into the Transformer backbone, replacing multiplication-intensive operations—particularly channel mixing—with LUT-based layers. It further introduces a neural learning-driven LUT generation mechanism jointly optimized with FPGA hardware constraints. Evaluated on CIFAR-10 and CIFAR-100, LL-ViT achieves 95.5% and 78.8% top-1 accuracy, respectively, while reducing model weights by over 60%, cutting multiplications by 50%, improving energy efficiency by 1.9×, and decreasing inference latency by 1.3×. The core contribution lies in the deep co-design of LUT neurons and Transformer architecture, enabling synergistic optimization across accuracy, computational efficiency, and FPGA hardware adaptability.

Technology Category

Application Category

📝 Abstract
Vision Transformers have been tremendously successful in computer vision tasks. However, their large computational, memory, and energy demands are a challenge for edge inference on FPGAs -- a field that has seen a recent surge in demand. We recognize the benefits of recent works on logic and Look Up Table (LUT) based networks, such as LogicNets, NeuraLUT, DWN, among others, in offering models that simultaneously reduce both the memory and compute footprints. However, these models natively do not perform well on common vision tasks, such as CIFAR-10/100. In this work, we propose LL-ViT, a novel edge optimized vision transformer design that integrates layers of LUT neurons within the transformer architecture. Based on our characterization that reveals that a majority of model weights and computations are from the channel mixer (MLP layer), we design an alternate LUT-based channel mixer, and simultaneously develop an FPGA-based accelerator for LL-ViT. Contrary to some attempts to replace each multiplication with a table lookup, our architecture utilizes a neural learning approach which natively learns the LUT functions. This approach allows for reduced model sizes, and a computational and energy-efficient inference solution for vision transformer models. Evaluating on edge-suitable workloads, we achieve accuracies of 95.5% on CIFAR-10, 78.8% on CIFAR-100, and 60.9% on Tiny-ImageNet datasets, comparable to the baseline transformer. LL-ViT eliminates over 60% of the model weights and 50% of the multiplications in the model, and achieves 1.9x energy efficiency and 1.3x lower latency over an integer quantized ViT accelerator, while also offering superior throughput against prior works at a 10.9W power budget.
Problem

Research questions and friction points this paper is trying to address.

Vision Transformers have high computational demands for edge deployment
Existing LUT-based networks perform poorly on vision tasks like CIFAR
Need energy-efficient ViT models for FPGA edge inference with low latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LUT neurons within transformer architecture
Uses neural learning approach for LUT functions
Develops FPGA accelerator for edge deployment
🔎 Similar Papers
No similar papers found.
S
Shashank Nag
Department of Electrical and Computer Engineering, The University of Texas at Austin, USA
A
Alan T. L. Bacellar
Department of Electrical and Computer Engineering, The University of Texas at Austin, USA
Z
Zachary Susskind
Department of Electrical and Computer Engineering, The University of Texas at Austin, USA
A
Anshul Jha
The University of Texas at San Antonio, USA
L
Logan Liberty
Department of Electrical and Computer Engineering, The University of Texas at Austin, USA
A
Aishwarya Sivakumar
Department of Electrical and Computer Engineering, The University of Texas at Austin, USA
E
Eugene B. John
The University of Texas at San Antonio, USA
K
Krishnan Kailas
Independent Researcher
P
Priscila M. V. Lima
Federal University of Rio de Janeiro, Brazil
Neeraja J. Yadwadkar
Neeraja J. Yadwadkar
Assistant Professor, University of Texas at Austin
Networked SystemsCloud ComputingMachine Learning
F
Felipe M. G. Franca
Instituto de Telecomunicacoes, Portugal
L
Lizy K. John
Department of Electrical and Computer Engineering, The University of Texas at Austin, USA