🤖 AI Summary
To address the challenges of detecting small-scale objects and occluded waste (e.g., under grass or stones) in garbage detection, this paper proposes a novel lightweight object detection framework incorporating privileged information. Specifically, bounding boxes are encoded as binary masks to serve as privileged supervision signals—enhancing localization accuracy without increasing model parameters. The method is integrated with five state-of-the-art detectors (YOLOv5, YOLOv8, Faster R-CNN, RetinaNet, and DETR) and evaluated across three diverse benchmarks: SODA, BDW, and UAVVaste, under cross-domain training and validation protocols. Experimental results demonstrate consistent improvements in detecting small and occluded waste, achieving average mAP gains of 3.2–5.7%. Moreover, the approach significantly enhances generalization capability while maintaining high computational efficiency—making it suitable for real-time, large-scale intelligent waste inspection deployments.
📝 Abstract
As litter pollution continues to rise globally, developing automated tools capable of detecting litter effectively remains a significant challenge. This study presents a novel approach that combines, for the first time, privileged information with deep learning object detection to improve litter detection while maintaining model efficiency. We evaluate our method across five widely used object detection models, addressing challenges such as detecting small litter and objects partially obscured by grass or stones. In addition to this, a key contribution of our work can also be attributed to formulating a means of encoding bounding box information as a binary mask, which can be fed to the detection model to refine detection guidance. Through experiments on both within-dataset evaluation on the renowned SODA dataset and cross-dataset evaluation on the BDW and UAVVaste litter detection datasets, we demonstrate consistent performance improvements across all models. Our approach not only bolsters detection accuracy within the training sets but also generalises well to other litter detection contexts. Crucially, these improvements are achieved without increasing model complexity or adding extra layers, ensuring computational efficiency and scalability. Our results suggest that this methodology offers a practical solution for litter detection, balancing accuracy and efficiency in real-world applications.