🤖 AI Summary
This paper addresses batch active learning under heterogeneous annotation costs and strict budget constraints, particularly for high-cost labeling and geospatially constrained scenarios. We propose two Bayesian neural network–based sampling strategies. Methodologically, (1) a dynamic uncertainty threshold mechanism adaptively balances exploration and exploitation; and (2) a greedy batch selection algorithm with budget reallocation maximizes information gain under hard budget constraints. To support empirical evaluation, we construct and publicly release two real-world building image datasets, each annotated with precise geographic coordinates. Experiments demonstrate that our approach significantly reduces both the number of active learning iterations and total annotation cost. It consistently outperforms random sampling and unconstrained baselines across diverse budget settings, with especially pronounced gains in geographically clustered tasks—highlighting its efficacy in cost-sensitive, spatially structured domains.
📝 Abstract
Varying annotation costs among data points and budget constraints can hinder the adoption of active learning strategies in real-world applications. This work introduces two Bayesian active learning strategies for batch acquisition under constraints (ConBatch-BAL), one based on dynamic thresholding and one following greedy acquisition. Both select samples using uncertainty metrics computed via Bayesian neural networks. The dynamic thresholding strategy redistributes the budget across the batch, while the greedy one selects the top-ranked sample at each step, limited by the remaining budget. Focusing on scenarios with costly data annotation and geospatial constraints, we also release two new real-world datasets containing geolocated aerial images of buildings, annotated with energy efficiency or typology classes. The ConBatch-BAL strategies are benchmarked against a random acquisition baseline on these datasets under various budget and cost scenarios. The results show that the developed ConBatch-BAL strategies can reduce active learning iterations and data acquisition costs in real-world settings, and even outperform the unconstrained baseline solutions.