🤖 AI Summary
To address image classification under scarce labeled data, this paper proposes a reinforcement learning–based adaptive active learning framework. The method introduces Deep Deterministic Policy Gradient (DDPG) into active learning for the first time, formulating an end-to-end trainable Markov Decision Process (MDP) that autonomously learns optimal sample querying policies—eliminating reliance on hand-crafted heuristics. The state space encodes model prediction uncertainty and sample diversity, while the action space corresponds to label acquisition decisions; a policy network dynamically optimizes the sampling process. Evaluated on CIFAR-10, CIFAR-100, and SVHN under identical labeling budgets, the framework achieves up to a 5.2% absolute improvement in classification accuracy over state-of-the-art active learning baselines, demonstrating substantial gains in sample efficiency during iterative low-shot training.
📝 Abstract
Deep learning is currently reaching outstanding performances on different tasks, including image classification, especially when using large neural networks. The success of these models is tributary to the availability of large collections of labeled training data. In many real-world scenarios, labeled data are scarce, and their hand-labeling is time, effort and cost demanding. Active learning is an alternative paradigm that mitigates the effort in hand-labeling data, where only a small fraction is iteratively selected from a large pool of unlabeled data, and annotated by an expert (a.k.a oracle), and eventually used to update the learning models. However, existing active learning solutions are dependent on handcrafted strategies that may fail in highly variable learning environments (datasets, scenarios, etc). In this work, we devise an adaptive active learning method based on Markov Decision Process (MDP). Our framework leverages deep reinforcement learning and active learning together with a Deep Deterministic Policy Gradient (DDPG) in order to dynamically adapt sample selection strategies to the oracle's feedback and the learning environment. Extensive experiments conducted on three different image classification benchmarks show superior performances against several existing active learning strategies.