🤖 AI Summary
This work addresses the limited generalization of existing vehicle detection datasets in dense, heterogeneous, and unstructured urban traffic scenes typical of developing countries, which suffer from significant geographic and viewpoint biases. To bridge this gap, we introduce BMD-45, a large-scale CCTV-based vehicle detection benchmark comprising 45K images and 480K high-quality human-annotated bounding boxes collected from over 3,600 real-world deployed cameras. The dataset encompasses 14 fine-grained vehicle categories, including region-specific classes such as three-wheelers, and authentically captures challenges like extreme viewing angles, severe occlusion, and high traffic density. Baseline experiments using YOLO and DETR family models demonstrate that models trained on BMD-45 achieve an mAP@0.50:0.95 of 83.8%, outperforming models fine-tuned on UA-DETRAC (33.6%) by a factor of 2.5, thereby underscoring the necessity of domain adaptation and filling critical gaps in geographic diversity and scene realism.
📝 Abstract
Robust vehicle detection from fixed CCTV cameras is critical for Intelligent Transportation Systems. Yet existing benchmarks predominantly feature relatively homogeneous, highly organized traffic patterns captured from ego-centric driving perspectives or controlled aerial views. This regional and sensor view bias creates a significant gap. Models trained on datasets such as UA-DETRAC and COCO struggle to generalize to the dense, heterogeneous, disorganized traffic conditions observed in rapidly developing urban centers in emerging economies. To address this limitation, we introduce BMD-45, a large-scale dataset comprising 480K bounding boxes annotated over 45K images captured from over 3.6K operational Safe City CCTV cameras. BMD-45 contains 14 fine-grained vehicle categories, including region-specific modes such as auto-rickshaws and tempo travellers, which are not present in existing benchmarks. The dataset captures real-world deployment challenges, including extreme viewpoint variation, occlusion, and vehicle density . We establish comprehensive baselines using state-of-the-art detectors and reveal a striking domain gap: models fine-tuned on UA-DETRAC achieve only 33.6% mAP@0.50:0.95, compared to 83.8% when trained in-domain on BMD-45, representing a 2.5x improvement that persists even when accounting for novel vehicle classes. This performance gap underscores the critical need for geographically diverse traffic benchmarks and establishes BMD-45 as a baseline for developing robust perception systems in underrepresented urban environments worldwide. The dataset is available at: https://huggingface.co/datasets/iisc-aim/BMD-45.