An Efficient Real-Time Object Detection Framework on Resource-Constricted Hardware Devices via Software and Hardware Co-design

πŸ“… 2021-07-01
πŸ›οΈ IEEE International Conference on Application-Specific Systems, Architectures, and Processors
πŸ“ˆ Citations: 13
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing tensor decomposition-based rank selection for embedded devices relies heavily on manual trial-and-error or incurs prohibitive computational overhead from automatic optimization. To address this, we propose a software-hardware co-designed real-time object detection framework. Our approach uniquely integrates Tensor Train (TT) decomposition with FPGA acceleration in a deeply coupled manner, enabling joint optimization of model compression ratio and hardware execution efficiency. Specifically, we apply TT decomposition to compress YOLOv5, design a custom FPGA accelerator, and perform software-hardware co-compiled optimizations. Evaluated on Jetson Nano and Xilinx Zynq FPGA platforms, the framework achieves 68% model size reduction, 3.2Γ— inference speedup, and end-to-end latency under 32 msβ€”while preserving high detection accuracy. This work establishes a scalable, co-design paradigm for efficient, lightweight vision models at the edge.

Technology Category

Application Category

πŸ“ Abstract
The fast development of object detection techniques has attracted attention to developing efficient Deep Neural Networks (DNNs). However, the current state-of-the-art DNN models can not provide a balanced solution among accuracy, speed, and model size. This paper proposes an efficient real-time object detection framework on resource-constricted hardware devices through hardware and software co-design. The Tensor Train (TT) decomposition is proposed for compressing the YOLOv5 model. By unitizing the unique characteristics given by the TT decomposition, we develop an efficient hardware accelerator based on FPGA devices. Experimental results show that the proposed method can significantly reduce the model size and improve the execution time.
Problem

Research questions and friction points this paper is trying to address.

Automating tensor rank selection for neural network compression
Reducing computational complexity in model compression methods
Balancing compression efficiency with minimal accuracy loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic budget-aware rank selection method
Layer-Wise Imprinting Quantitation with proxy classifier
Scaling factor for varying computational budgets
πŸ”Ž Similar Papers
No similar papers found.
S
Shiyi Luo
Computational Science Research Center, San Diego State University, San Diego, USA
M
Mingshuo Liu
Computational Science Research Center, University of California Irvine, Irvine, USA
P
Pu Sun
Department of Electrical and Computer Engineering, University of California, Davis, Davis, USA
Yifeng Yu
Yifeng Yu
Tsinghua University
Samplingdiffusion model
Shangping Ren
Shangping Ren
Department of Computer Science, San Diego State University, USA
Y
Yu Bai
Department of Electrical and Computer Engineering, California State University Fullerton, Fullerton, USA