π€ AI Summary
This work addresses the lack of dedicated datasets for detecting extremely small, sparse, and cluttered roadside litter in dashcam videosβa task currently reliant on manual inspection. To bridge this gap, we introduce RoLID-11K, the first large-scale dataset specifically designed for roadside litter detection from a dashcam perspective, comprising over 11,000 annotated images captured across diverse driving scenarios in the UK. The dataset exhibits a pronounced long-tailed distribution and presents significant challenges due to the extreme scale of target objects. We benchmark several state-of-the-art object detectors on RoLID-11K and find that Transformer-based architectures, such as CO-DETR, achieve superior localization accuracy, whereas real-time YOLO variants are limited by their coarse feature hierarchies. RoLID-11K establishes a new benchmark for small-object detection in dynamic driving environments.
π Abstract
Roadside litter poses environmental, safety and economic challenges, yet current monitoring relies on labour-intensive surveys and public reporting, providing limited spatial coverage. Existing vision datasets for litter detection focus on street-level still images, aerial scenes or aquatic environments, and do not reflect the unique characteristics of dashcam footage, where litter appears extremely small, sparse and embedded in cluttered road-verge backgrounds. We introduce RoLID-11K, the first large-scale dataset for roadside litter detection from dashcams, comprising over 11k annotated images spanning diverse UK driving conditions and exhibiting pronounced long-tail and small-object distributions. We benchmark a broad spectrum of modern detectors, from accuracy-oriented transformer architectures to real-time YOLO models, and analyse their strengths and limitations on this challenging task. Our results show that while CO-DETR and related transformers achieve the best localisation accuracy, real-time models remain constrained by coarse feature hierarchies. RoLID-11K establishes a challenging benchmark for extreme small-object detection in dynamic driving scenes and aims to support the development of scalable, low-cost systems for roadside-litter monitoring. The dataset is available at https://github.com/xq141839/RoLID-11K.