REALITrees: Rashomon Ensemble Active Learning for Interpretable Trees

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work proposes the Rashomon Ensemble Active Learning (REAL) framework, which addresses the limitation of traditional active learning methods that rely on randomly perturbed models to form committees, often failing to accurately represent plausible hypotheses in the hypothesis space and thereby underestimating information gain. REAL is the first to integrate Rashomon set theory into active learning by explicitly enumerating near-optimal sparse decision trees to construct a diverse yet non-redundant committee. It further employs a PAC-Bayesian Gibbs posterior to weight committee members, enabling a more precise quantification of epistemic uncertainty to guide sample selection. Experimental results demonstrate that REAL outperforms random ensemble-based approaches on both synthetic and standard benchmark datasets, achieving faster convergence—particularly under moderate noise levels—and substantially reducing annotation costs.

Technology Category

Application Category

📝 Abstract

Active learning reduces labeling costs by selecting samples that maximize information gain. A dominant framework, Query-by-Committee (QBC), typically relies on perturbation-based diversity by inducing model disagreement through random feature subsetting or data blinding. While this approximates one notion of epistemic uncertainty, it sacrifices direct characterization of the plausible hypothesis space. We propose the complementary approach: Rashomon Ensembled Active Learning (REAL) which constructs a committee by exhaustively enumerating the Rashomon Set of all near-optimal models. To address functional redundancy within this set, we adopt a PAC-Bayesian framework using a Gibbs posterior to weight committee members by their empirical risk. Leveraging recent algorithmic advances, we exactly enumerate this set for the class of sparse decision trees. Across synthetic and established active learning baselines, REAL outperforms randomized ensembles, particularly in moderately noisy environments where it strategically leverages expanded model multiplicity to achieve faster convergence.

Problem

Research questions and friction points this paper is trying to address.

active learning

Rashomon set

interpretable models

decision trees

epistemic uncertainty

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rashomon Set

Active Learning

Interpretable Decision Trees