🤖 AI Summary
This work investigates how to leverage fine-grained privileged information—such as masks, saliency maps, and depth cues—available only during training to enhance object detection performance without increasing inference complexity. To this end, it systematically introduces the Learning Using Privileged Information (LUPI) paradigm into object detection for the first time, proposing a model-agnostic teacher–student distillation framework in which a teacher network fuses multimodal privileged signals to guide the training of a student detector. The approach requires no architectural modifications at inference time and integrates seamlessly with mainstream detectors. Experiments on benchmarks including Pascal VOC 2012 and UAV-based litter detection demonstrate consistent and significant improvements in detection accuracy—particularly for medium and large objects—without any increase in model size or inference overhead, thereby validating its generality and effectiveness.
📝 Abstract
This paper investigates the integration of the Learning Using Privileged Information (LUPI) paradigm in object detection to exploit fine-grained, descriptive information available during training but not at inference. We introduce a general, model-agnostic methodology for injecting privileged information-such as bounding box masks, saliency maps, and depth cues-into deep learning-based object detectors through a teacher-student architecture. Experiments are conducted across five state-of-the-art object detection models and multiple public benchmarks, including UAV-based litter detection datasets and Pascal VOC 2012, to assess the impact on accuracy, generalization, and computational efficiency. Our results demonstrate that LUPI-trained students consistently outperform their baseline counterparts, achieving significant boosts in detection accuracy with no increase in inference complexity or model size. Performance improvements are especially marked for medium and large objects, while ablation studies reveal that intermediate weighting of teacher guidance optimally balances learning from privileged and standard inputs. The findings affirm that the LUPI framework provides an effective and practical strategy for advancing object detection systems in both resource-constrained and real-world settings.