Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Batch active learning faces challenges in acquisition function selection and suffers from high computational cost and performance degradation under large batch sizes. Method: This paper proposes a theory-driven batch framework grounded in Bayesian decision theory, introducing asymptotic approximation to Bayesian active learning for the first time and deriving the Bayesian Active Information Targeting (BAIT) algorithm. It further introduces Partial Batch Label Sampling (ParBaLS), which selects labels for only a subset of candidate samples under limited annotation budgets, substantially reducing computational complexity. The method integrates Bayesian logistic regression, neural network embeddings, Expected Predictive Information Gain (EPIG), and V-optimal experimental design into a myopic active learning pipeline. Contribution/Results: On multiple benchmark datasets, ParBaLS+EPIG significantly outperforms state-of-the-art methods under identical annotation budgets, maintaining both efficiency and performance stability—especially in large-batch regimes.

Technology Category

Application Category

📝 Abstract
Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian Decision Theory (BDT) offers a universal principle to guide decision-making. In this work, we derive BDT for (Bayesian) active learning in the myopic framework, where we imagine we only have one more point to label. This derivation leads to effective algorithms such as Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and other algorithms that appear in the literature. Furthermore, we show that BAIT (active learning based on V-optimal experimental design) can be derived from BDT and asymptotic approximations. A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-$B$ selection). Here, using a particular formulation of the decision process, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on Neural Embeddings. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.
Problem

Research questions and friction points this paper is trying to address.

Addressing unclear choice among active learning acquisition functions
Deriving Bayesian Decision Theory for myopic active learning framework
Solving computational scaling challenges for large batch active learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Derived BDT for myopic active learning framework
Introduced Partial Batch Label Sampling for EPIG
Applied Bayesian Logistic Regression on Neural Embeddings
🔎 Similar Papers
No similar papers found.