An Efficient Real-Time Object Detection Framework on Resource-Constricted Hardware Devices via Software and Hardware Co-design

📅 2021-07-01

🏛️ IEEE International Conference on Application-Specific Systems, Architectures, and Processors

📈 Citations: 13

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Existing tensor decomposition-based rank selection for embedded devices relies heavily on manual trial-and-error or incurs prohibitive computational overhead from automatic optimization. To address this, we propose a software-hardware co-designed real-time object detection framework. Our approach uniquely integrates Tensor Train (TT) decomposition with FPGA acceleration in a deeply coupled manner, enabling joint optimization of model compression ratio and hardware execution efficiency. Specifically, we apply TT decomposition to compress YOLOv5, design a custom FPGA accelerator, and perform software-hardware co-compiled optimizations. Evaluated on Jetson Nano and Xilinx Zynq FPGA platforms, the framework achieves 68% model size reduction, 3.2× inference speedup, and end-to-end latency under 32 ms—while preserving high detection accuracy. This work establishes a scalable, co-design paradigm for efficient, lightweight vision models at the edge.

Technology Category

Application Category

📝 Abstract

The fast development of object detection techniques has attracted attention to developing efficient Deep Neural Networks (DNNs). However, the current state-of-the-art DNN models can not provide a balanced solution among accuracy, speed, and model size. This paper proposes an efficient real-time object detection framework on resource-constricted hardware devices through hardware and software co-design. The Tensor Train (TT) decomposition is proposed for compressing the YOLOv5 model. By unitizing the unique characteristics given by the TT decomposition, we develop an efficient hardware accelerator based on FPGA devices. Experimental results show that the proposed method can significantly reduce the model size and improve the execution time.

Problem

Research questions and friction points this paper is trying to address.

Automating tensor rank selection for neural network compression

Reducing computational complexity in model compression methods

Balancing compression efficiency with minimal accuracy loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic budget-aware rank selection method

Layer-Wise Imprinting Quantitation with proxy classifier

Scaling factor for varying computational budgets

🔎 Similar Papers

No similar papers found.