🤖 AI Summary
In object detection, existing online data selection methods suffer from architectural complexity and domain shift, hindering effective sample value assessment. This paper proposes DetGain—the first framework to introduce marginal contribution modeling into online data selection for object detection. DetGain dynamically estimates each sample’s marginal gain in mean Average Precision (mAP) by quantifying its perturbation effect on the global mAP and integrating teacher-student prediction discrepancies. It requires no detector architecture modification, relying solely on prediction quality evaluation, global score distribution modeling, and teacher-student divergence analysis—ensuring low intrusiveness and strong generalizability. Extensive experiments across multiple detectors on COCO demonstrate that DetGain significantly accelerates convergence and improves final accuracy, exhibits robustness to noisy or low-quality data, and seamlessly synergizes with knowledge distillation for further performance gains.
📝 Abstract
High-quality data has become a primary driver of progress under scale laws, with curated datasets often outperforming much larger unfiltered ones at lower cost. Online data curation extends this idea by dynamically selecting training samples based on the model's evolving state. While effective in classification and multimodal learning, existing online sampling strategies rarely extend to object detection because of its structural complexity and domain gaps. We introduce DetGain, an online data curation method specifically for object detection that estimates the marginal perturbation of each image to dataset-level Average Precision (AP) based on its prediction quality. By modeling global score distributions, DetGain efficiently estimates the global AP change and computes teacher-student contribution gaps to select informative samples at each iteration. The method is architecture-agnostic and minimally intrusive, enabling straightforward integration into diverse object detection architectures. Experiments on the COCO dataset with multiple representative detectors show consistent improvements in accuracy. DetGain also demonstrates strong robustness under low-quality data and can be effectively combined with knowledge distillation techniques to further enhance performance, highlighting its potential as a general and complementary strategy for data-efficient object detection.