Image Recognition with Online Lightweight Vision Transformer: A Survey

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address the high computational cost and memory footprint of Vision Transformers (ViTs) in image classification, this paper presents a systematic survey and introduces the first online lightweight evaluation framework for ViT classification. We propose three synergistic optimization strategies: (1) efficient attention mechanisms coupled with modular pruning, (2) input-adaptive dynamic computation, and (3) multi-stage knowledge distillation with real-time inference scheduling. We establish, for the first time, a unified evaluation paradigm quantifying the multi-dimensional trade-offs among accuracy, parameter count, throughput, and memory usage on ImageNet-1K. Our best-performing model achieves 83.2% Top-1 accuracy with fewer than 15M parameters—2.4× faster and 37% more memory-efficient than ViT-Tiny. The work identifies dynamic sparsification and collaborative distillation as critical frontiers in lightweight ViT research and releases open-source code to ensure reproducibility.

Technology Category

Application Category

📝 Abstract

The Transformer architecture has achieved significant success in natural language processing, motivating its adaptation to computer vision tasks. Unlike convolutional neural networks, vision transformers inherently capture long-range dependencies and enable parallel processing, yet lack inductive biases and efficiency benefits, facing significant computational and memory challenges that limit its real-world applicability. This paper surveys various online strategies for generating lightweight vision transformers for image recognition, focusing on three key areas: Efficient Component Design, Dynamic Network, and Knowledge Distillation. We evaluate the relevant exploration for each topic on the ImageNet-1K benchmark, analyzing trade-offs among precision, parameters, throughput, and more to highlight their respective advantages, disadvantages, and flexibility. Finally, we propose future research directions and potential challenges in the lightweighting of vision transformers with the aim of inspiring further exploration and providing practical guidance for the community. Project Page: https://github.com/ajxklo/Lightweight-VIT

Problem

Research questions and friction points this paper is trying to address.

Adapting Transformer architecture for efficient image recognition tasks

Addressing computational and memory challenges in vision transformers

Surveying lightweight strategies for vision transformers in real-world applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient Component Design for lightweight vision transformers

Dynamic Network strategies for adaptive processing

Knowledge Distillation to enhance model efficiency

🔎 Similar Papers

No similar papers found.