Replication Study and Benchmarking of Real-Time Object Detection Models

📅 2024-05-11
đŸ›ïž arXiv.org
📈 Citations: 1
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
This work addresses the poor reproducibility and inconsistent benchmarking of real-time object detection models. We establish a standardized training and multi-GPU inference evaluation framework built upon MMDetection, systematically reproducing state-of-the-art models—including DETR, RTMDet, ViTDet, and YOLOv7—on MS COCO 2017. Our methodology ensures end-to-end reproducible configurations, hardware-agnostic (multi-GPU) joint evaluation of accuracy and latency, and strict alignment with original training protocols and hyperparameters. Key contributions are: (1) demonstrating the superior accuracy–latency trade-off of anchor-free detectors (e.g., RTMDet, YOLOX); (2) revealing widespread reproducibility challenges—RTMDet and YOLOv7 achieve original performance, whereas DETR and ViTDet fall short; and (3) quantitatively confirming a strong negative correlation between accuracy and inference speed, along with significant degradation in inference efficiency of pre-trained models under resource-constrained conditions.

Technology Category

Application Category

📝 Abstract
This work examines the reproducibility and benchmarking of state-of-the-art real-time object detection models. As object detection models are often used in real-world contexts, such as robotics, where inference time is paramount, simply measuring models' accuracy is not enough to compare them. We thus compare a large variety of object detection models' accuracy and inference speed on multiple graphics cards. In addition to this large benchmarking attempt, we also reproduce the following models from scratch using PyTorch on the MS COCO 2017 dataset: DETR, RTMDet, ViTDet and YOLOv7. More importantly, we propose a unified training and evaluation pipeline, based on MMDetection's features, to better compare models. Our implementation of DETR and ViTDet could not achieve accuracy or speed performances comparable to what is declared in the original papers. On the other hand, reproduced RTMDet and YOLOv7 could match such performances. Studied papers are also found to be generally lacking for reproducibility purposes. As for MMDetection pretrained models, speed performances are severely reduced with limited computing resources (larger, more accurate models even more so). Moreover, results exhibit a strong trade-off between accuracy and speed, prevailed by anchor-free models - notably RTMDet or YOLOx models. The code used is this paper and all the experiments is available in the repository at https://github.com/Don767/segdet_mlcr2024.
Problem

Research questions and friction points this paper is trying to address.

Evaluating reproducibility of real-time object detection models
Benchmarking accuracy and inference speed across hardware
Assessing performance gap between original and reproduced models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reproduced multiple models using PyTorch
Proposed unified training evaluation pipeline
Benchmarked accuracy speed trade-off comparisons
🔎 Similar Papers
No similar papers found.
P
Pierre-Luc Asselin
DĂ©partement de physique, de gĂ©nie physique et d’optique, UniversitĂ© Laval, QuĂ©bec, QuĂ©bec, Canada.
V
Vincent Coulombe
Département de génie électrique et de génie informatique, Université Laval, Québec, Québec, Canada.
William Guimont-Martin
William Guimont-Martin
Université Laval
Deep Learning3D Object DetectionPoint CloudsRobotics
W
William Larrivée-Hardy
DĂ©partement d’informatique et de gĂ©nie logiciel, UniversitĂ© Laval, QuĂ©bec, QuĂ©bec, Canada.