Active Learning for Decision Trees with Provable Guarantees

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the label query complexity of decision trees in active learning. Under the reasonable assumptions that path features are non-redundant and input data exhibit a regular grid structure, the study presents the first analysis of the disagreement coefficient for decision trees and introduces the first general-purpose active learning algorithm with a $(1+\varepsilon)$ multiplicative error guarantee. The proposed method reduces label complexity to polylogarithmic in the data size and establishes near-optimal dependence on $\varepsilon$, achieving performance guarantees that approach the theoretical lower bound.

Technology Category

Application Category

📝 Abstract
This paper advances the theoretical understanding of active learning label complexity for decision trees as binary classifiers. We make two main contributions. First, we provide the first analysis of the disagreement coefficient for decision trees-a key parameter governing active learning label complexity. Our analysis holds under two natural assumptions required for achieving polylogarithmic label complexity, (i) each root-to-leaf path queries distinct feature dimensions, and (ii) the input data has a regular, grid-like structure. We show these assumptions are essential, as relaxing them leads to polynomial label complexity. Second, we present the first general active learning algorithm for binary classification that achieves a multiplicative error guarantee, producing a $(1+\epsilon)$-approximate classifier. By combining these results, we design an active learning algorithm for decision trees that uses only a polylogarithmic number of label queries in the dataset size, under the stated assumptions. Finally, we establish a label complexity lower bound, showing our algorithm's dependence on the error tolerance $\epsilon$ is close to optimal.
Problem

Research questions and friction points this paper is trying to address.

active learning
decision trees
label complexity
disagreement coefficient
binary classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

active learning
decision trees
label complexity
disagreement coefficient
multiplicative error guarantee
🔎 Similar Papers
No similar papers found.
Arshia Soltani Moakhar
Arshia Soltani Moakhar
University of Maryland
Theory of RobustnessInterpretability
T
Tanapoom Laoaron
Univeristy of Maryland
F
Faraz Ghahremani
Univeristy of Maryland
Kiarash Banihashem
Kiarash Banihashem
University of Maryland
M
Mohammadtaghi Hajiaghayi
Univeristy of Maryland