Dependency-aware Maximum Likelihood Estimation for Active Learning

📅 2025-03-07

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

In active learning, sample selection exhibits sequential dependence, yet conventional maximum likelihood estimation (MLE) assumes independent and identically distributed (i.i.d.) data, causing model updates to misalign with the actual sampling process. To address this, we propose Dependence-Aware MLE (DMLE), the first framework to explicitly model and correct inter-sample dependencies during parameter estimation—thereby aligning estimation with the intrinsic non-i.i.d. nature of active learning. DMLE integrates entropy-driven querying with batch sampling (k = 1, 5, 10). Within the first 100 queried samples, it achieves average accuracy gains of +6.0% (k = 1), +8.6% (k = 5), and +10.5% (k = 10) over standard MLE, while also converging faster and attaining superior final performance. Our core contribution is the principled incorporation of sampling dependence into the MLE theoretical framework, establishing a novel paradigm for consistent parameter estimation in non-i.i.d. active learning settings.

Technology Category

Application Category

📝 Abstract

Active learning aims to efficiently build a labeled training set by strategically selecting samples to query labels from annotators. In this sequential process, each sample acquisition influences subsequent selections, causing dependencies among samples in the labeled set. However, these dependencies are overlooked during the model parameter estimation stage when updating the model using Maximum Likelihood Estimation (MLE), a conventional method that assumes independent and identically distributed (i.i.d.) data. We propose Dependency-aware MLE (DMLE), which corrects MLE within the active learning framework by addressing sample dependencies typically neglected due to the i.i.d. assumption, ensuring consistency with active learning principles in the model parameter estimation process. This improved method achieves superior performance across multiple benchmark datasets, reaching higher performance in earlier cycles compared to conventional MLE. Specifically, we observe average accuracy improvements of 6%, 8.6%, and 10.5% for $k=1$, $k=5$, and $k=10$ respectively, after collecting the first 100 samples, where entropy is the acquisition function and $k$ is the query batch size acquired at every active learning cycle.

Problem

Research questions and friction points this paper is trying to address.

Addresses sample dependencies in active learning.

Improves Maximum Likelihood Estimation (MLE) for active learning.

Enhances model accuracy in early active learning cycles.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dependency-aware MLE corrects i.i.d. assumption.

DMLE integrates sample dependencies in active learning.

Improved accuracy in early active learning cycles.

🔎 Similar Papers

Aligning Data Selection with Performance: Performance-driven Reinforcement Learning for Active Learning in Object Detection