ODTlearn: A Package for Learning Optimal Decision Trees for Prediction and Prescription

📅 2023-07-28
🏛️ arXiv.org
📈 Citations: 2
Influential: 1
📄 PDF

career value

176K/year
🤖 AI Summary
To address the insufficient interpretability, fairness, and robustness of traditional decision trees in high-stakes prediction and prescriptive decision-making, this paper proposes a unified modeling framework based on mixed-integer optimization (MIO) and releases ODTlearn, an open-source Python package. The framework systematically integrates four classes of optimal decision trees—classification, fair classification, distributionally robust classification, and observational-data-driven prescriptive trees—enabling multi-objective trade-offs and constraint-based modeling. Designed with object-oriented principles, it supports commercial (e.g., Gurobi) and open-source (e.g., COIN-OR CBC) solvers, balancing computational efficiency and scalability. Comprehensive documentation, tutorials, and fully reproducible code are provided. Empirical evaluations demonstrate that the approach preserves strong interpretability while significantly improving fairness, out-of-distribution generalization, and individualized prescription quality in high-risk settings.
📝 Abstract
ODTLearn is an open-source Python package that provides methods for learning optimal decision trees for high-stakes predictive and prescriptive tasks based on the mixed-integer optimization (MIO) framework proposed in Aghaei et al. (2019) and several of its extensions. The current version of the package provides implementations for learning optimal classification trees, optimal fair classification trees, optimal classification trees robust to distribution shifts, and optimal prescriptive trees from observational data. We have designed the package to be easy to maintain and extend as new optimal decision tree problem classes, reformulation strategies, and solution algorithms are introduced. To this end, the package follows object-oriented design principles and supports both commercial (Gurobi) and open source (COIN-OR branch and cut) solvers. The package documentation and an extensive user guide can be found at https://d3m-research-group.github.io/odtlearn/. Additionally, users can view the package source code and submit feature requests and bug reports by visiting https://github.com/D3M-Research-Group/odtlearn.
Problem

Research questions and friction points this paper is trying to address.

Learning optimal decision trees for prediction tasks
Developing optimal fair classification trees robustly
Creating optimal prescriptive trees from observational data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses mixed-integer optimization framework
Implements optimal classification and prescriptive trees
Supports both commercial and open-source solvers
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
Patrick Vossler
Patrick Vossler
University of California, San Francisco
StatisticsCausal InferenceMachine LearningHigh-dimensional statisticsfeature selection
S
S. Aghaei
University of Southern California, Center for AI in Society, Los Angeles, CA 90089
Nathan Justin
Nathan Justin
PhD Candidate, University of Southern California
OptimizationMachine LearningOperations Research
Nathanael Jo
Nathanael Jo
Massachusetts Institute of Technology
A
Andr'es G'omez
University of Southern California, Center for AI in Society, Los Angeles, CA 90089
P
P. Vayanos
University of Southern California, Center for AI in Society, Los Angeles, CA 90089