A Simple Approximation Algorithm for Optimal Decision Tree

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the Optimal Decision Tree (ODT) problem: given a hypothesis class, query costs, and response probabilities, minimize the expected cost of identifying the true hypothesis. Arising in active learning and medical diagnosis, ODT is NP-hard, and existing approximation algorithms suffer from high computational complexity and large constant factors. We propose the first simple, theoretically grounded greedy approximation algorithm for ODT, relying solely on potential-function analysis—without linear programming relaxations or intricate sampling schemes. Our method achieves an approximation ratio of $8 ln m$, where $m$ is the number of hypotheses, marking the first such logarithmic guarantee with a small explicit constant. This significantly simplifies analysis and improves over prior bounds in both complexity and constants. The algorithm is interpretable, practical, and easily implementable, offering an efficient new solution to the NP-hard ODT problem.

Technology Category

Application Category

📝 Abstract
Optimal decision tree (odt) is a fundamental problem arising in applications such as active learning, entity identification, and medical diagnosis. An instance of odt is given by $m$ hypotheses, out of which an unknown ``true'' hypothesis is drawn according to some probability distribution. An algorithm needs to identify the true hypothesis by making queries: each query incurs a cost and has a known response for each hypothesis. The goal is to minimize the expected query cost to identify the true hypothesis. We consider the most general setting with arbitrary costs, probabilities and responses. odt is NP-hard to approximate better than $ln m$ and there are $O(ln m)$ approximation algorithms known for it. However, these algorithms and/or their analyses are quite complex. Moreover, the leading constant factors are large. We provide a simple algorithm and analysis for odt, proving an approximation ratio of $8 ln m$.
Problem

Research questions and friction points this paper is trying to address.

Approximates optimal decision tree for hypothesis identification
Minimizes expected query cost with arbitrary parameters
Simplifies complex existing ODT approximation algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simple algorithm for optimal decision tree
Approximation ratio of 8 ln m
Handles arbitrary costs and probabilities
Z
Zhengjia Zhuo
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, USA
Viswanath Nagarajan
Viswanath Nagarajan
University of Michigan
Approximation AlgorithmsCombinatorial Optimization