A Simple Approximation Algorithm for Optimal Decision Tree

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This paper studies the Optimal Decision Tree (ODT) problem: given a hypothesis class, query costs, and response probabilities, minimize the expected cost of identifying the true hypothesis. Arising in active learning and medical diagnosis, ODT is NP-hard, and existing approximation algorithms suffer from high computational complexity and large constant factors. We propose the first simple, theoretically grounded greedy approximation algorithm for ODT, relying solely on potential-function analysis—without linear programming relaxations or intricate sampling schemes. Our method achieves an approximation ratio of $8 ln m$, where $m$ is the number of hypotheses, marking the first such logarithmic guarantee with a small explicit constant. This significantly simplifies analysis and improves over prior bounds in both complexity and constants. The algorithm is interpretable, practical, and easily implementable, offering an efficient new solution to the NP-hard ODT problem.

Technology Category

Application Category

📝 Abstract

Optimal decision tree (odt) is a fundamental problem arising in applications such as active learning, entity identification, and medical diagnosis. An instance of odt is given by $m$ hypotheses, out of which an unknown ``true'' hypothesis is drawn according to some probability distribution. An algorithm needs to identify the true hypothesis by making queries: each query incurs a cost and has a known response for each hypothesis. The goal is to minimize the expected query cost to identify the true hypothesis. We consider the most general setting with arbitrary costs, probabilities and responses. odt is NP-hard to approximate better than $ln m$ and there are $O(ln m)$ approximation algorithms known for it. However, these algorithms and/or their analyses are quite complex. Moreover, the leading constant factors are large. We provide a simple algorithm and analysis for odt, proving an approximation ratio of $8 ln m$.

Problem

Research questions and friction points this paper is trying to address.

Approximates optimal decision tree for hypothesis identification

Minimizes expected query cost with arbitrary parameters

Simplifies complex existing ODT approximation algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simple algorithm for optimal decision tree

Approximation ratio of 8 ln m

Handles arbitrary costs and probabilities

🔎 Similar Papers

Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance