🤖 AI Summary
This work addresses the challenge of traditional active learning methods misselecting out-of-distribution samples in open-set scenarios, which leads to inefficient use of annotation resources. The authors propose a two-stage energy-based active learning framework: in the first stage, an energy-based model (EBM) effectively distinguishes between known and unknown class samples, filtering out unreliable data; in the second stage, another EBM evaluates the informativeness of the remaining known-class samples to guide efficient annotation selection. This approach is the first to integrate energy-based models into open-set active learning for both known/unknown discrimination and sample scoring. Experimental results demonstrate that the method significantly improves both labeling efficiency and classification performance, consistently outperforming existing approaches on benchmark datasets including CIFAR-10, CIFAR-100, TinyImageNet, and ModelNet40.
📝 Abstract
Active learning (AL) has emerged as a crucial methodology for minimizing labeling costs in deep learning by selecting the most valuable samples from a pool of unlabeled data for annotation. Traditional AL operates under a closed-set assumption, where all classes in the dataset are known and consistent. However, real-world scenarios often present open-set conditions in which unlabeled data contains both known and unknown classes. In such environments, standard AL techniques struggle. They can mistakenly query samples from unknown categories, leading to inefficient use of annotation budgets. In this paper, we propose a novel dual-stage energy-based framework for open-set AL. Our method employs two specialized energy-based models (EBMs). The first, an energy-based known/unknown separator, filters out samples likely to belong to unknown classes. The second, an energy-based sample scorer, assesses the informativeness of the filtered known samples. Using the energy landscape, our models distinguish between data points from known and unknown classes in the unlabeled pool by assigning lower energy to known samples and higher energy to unknown samples, ensuring that only samples from classes of interest are selected for labeling. By integrating these components, our approach ensures efficient and targeted sample selection, maximizing learning impact in each iteration. Experiments on 2D (CIFAR-10, CIFAR-100, TinyImageNet) and 3D (ModelNet40) object classification benchmarks demonstrates that our framework outperforms existing approaches, achieving superior annotation efficiency and classification performance in open-set environments.