🤖 AI Summary
High annotation costs and reliance on model-specific internals hinder active learning for wildlife camera-trap animal detection. To address this, we propose a model-agnostic, multi-granularity active learning framework that jointly leverages object-level and image-level uncertainty and diversity metrics to evaluate sample informativeness in a black-box manner—requiring no access to model gradients or confidence scores. This work introduces the model-agnostic paradigm to camera-trap animal detection for the first time and designs a generalizable scoring fusion mechanism. Experiments on benchmark datasets demonstrate that annotating only 30% of samples achieves detection performance (mAP) comparable to or exceeding that of full supervision—yielding gains of 1.2–2.8 mAP points. The approach significantly reduces annotation effort while maintaining broad applicability and practical utility across diverse detection models.
📝 Abstract
Smart data selection is becoming increasingly important in data-driven machine learning. Active learning offers a promising solution by allowing machine learning models to be effectively trained with optimal data including the most informative samples from large datasets. Wildlife data captured by camera traps are excessive in volume, requiring tremendous effort in data labelling and animal detection models training. Therefore, applying active learning to optimise the amount of labelled data would be a great aid in enabling automated wildlife monitoring and conservation. However, existing active learning techniques require that a machine learning model (i.e., an object detector) be fully accessible, limiting the applicability of the techniques. In this paper, we propose a model-agnostic active learning approach for detection of animals captured by camera traps. Our approach integrates uncertainty and diversity quantities of samples at both the object-based and image-based levels into the active learning sample selection process. We validate our approach in a benchmark animal dataset. Experimental results demonstrate that, using only 30% of the training data selected by our approach, a state-of-the-art animal detector can achieve a performance of equal or greater than that with the use of the complete training dataset.