🤖 AI Summary
This paper studies the Optimal Decision Tree (ODT) problem: given a hypothesis class, query costs, and response probabilities, minimize the expected cost of identifying the true hypothesis. Arising in active learning and medical diagnosis, ODT is NP-hard, and existing approximation algorithms suffer from high computational complexity and large constant factors. We propose the first simple, theoretically grounded greedy approximation algorithm for ODT, relying solely on potential-function analysis—without linear programming relaxations or intricate sampling schemes. Our method achieves an approximation ratio of $8 ln m$, where $m$ is the number of hypotheses, marking the first such logarithmic guarantee with a small explicit constant. This significantly simplifies analysis and improves over prior bounds in both complexity and constants. The algorithm is interpretable, practical, and easily implementable, offering an efficient new solution to the NP-hard ODT problem.
📝 Abstract
Optimal decision tree (odt) is a fundamental problem arising in applications such as active learning, entity identification, and medical diagnosis. An instance of odt is given by $m$ hypotheses, out of which an unknown ``true'' hypothesis is drawn according to some probability distribution. An algorithm needs to identify the true hypothesis by making queries: each query incurs a cost and has a known response for each hypothesis. The goal is to minimize the expected query cost to identify the true hypothesis. We consider the most general setting with arbitrary costs, probabilities and responses. odt is NP-hard to approximate better than $ln m$ and there are $O(ln m)$ approximation algorithms known for it. However, these algorithms and/or their analyses are quite complex. Moreover, the leading constant factors are large. We provide a simple algorithm and analysis for odt, proving an approximation ratio of $8 ln m$.