How to Purchase Labels? A Cost-Effective Approach Using Active Learning Markets

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the high cost and low efficiency of acquiring labeled data in resource-constrained settings. We propose an active learning–driven market mechanism that formulates label procurement as a budget-constrained optimization problem with a performance improvement threshold. Within a single-buyer–multiple-seller framework, we jointly design market clearing, active learning strategies—specifically variance- and committee-based query selection—and a differentiated pricing scheme, constituting the first such integration. Evaluated on real-world real estate price prediction and energy demand forecasting tasks, our method achieves significantly higher model performance using fewer labels than random sampling, while demonstrating strong robustness to label noise and distributional shift. Our core contribution is the establishment of the first data procurement paradigm unifying active learning with microeconomic market mechanisms, enabling Pareto-optimal trade-offs between annotation cost and model utility.

Technology Category

Application Category

📝 Abstract
We introduce and analyse active learning markets as a way to purchase labels, in situations where analysts aim to acquire additional data to improve model fitting, or to better train models for predictive analytics applications. This comes in contrast to the many proposals that already exist to purchase features and examples. By originally formalising the market clearing as an optimisation problem, we integrate budget constraints and improvement thresholds into the label acquisition process. We focus on a single-buyer-multiple-seller setup and propose the use of two active learning strategies (variance based and query-by-committee based), paired with distinct pricing mechanisms. They are compared to a benchmark random sampling approach. The proposed strategies are validated on real-world datasets from two critical application domains: real estate pricing and energy forecasting. Results demonstrate the robustness of our approach, consistently achieving superior performance with fewer labels acquired compared to conventional methods. Our proposal comprises an easy-to-implement practical solution for optimising data acquisition in resource-constrained environments.
Problem

Research questions and friction points this paper is trying to address.

Purchasing labels cost-effectively using active learning markets
Integrating budget constraints into label acquisition optimization
Validating approach on real estate and energy datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active learning markets for purchasing labels
Optimization with budget constraints and thresholds
Variance and query-by-committee active learning strategies
🔎 Similar Papers
No similar papers found.
X
Xiwen Huang
Dyson School of Design Engineering, Imperial College London
Pierre Pinson
Pierre Pinson
Imperial College London
ForecastingGame theoryDecision-making under uncertainty