Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

To address poor interpretability and high computational cost of black-box models on low-resource tabular data, this paper proposes a decision tree generation paradigm leveraging the structured reasoning capabilities of large language models (LLMs). Methodologically, we design a lightweight toolset enabling the LLM to act as an intelligent agent that jointly integrates domain priors and data-driven learning to construct editable and auditable decision trees; human-in-the-loop intervention is supported for bias correction and domain-knowledge injection, with explicit, inspectable reasoning traces generated throughout. Our key contribution is the first systematic integration of LLMs’ structured reasoning into decision tree induction, achieving a balanced trade-off among predictive performance, interpretability, and controllability. Experiments demonstrate that our approach significantly outperforms CART in low-resource settings, while remaining competitive—though slightly inferior—to state-of-the-art black-box models. Crucially, it yields lightweight, fully transparent, and production-deployable decision tree models.

Technology Category

Application Category

📝 Abstract

Tabular foundation models are becoming increasingly popular for low-resource tabular problems. These models make up for small training datasets by pretraining on large volumes of synthetic data. The prior knowledge obtained via pretraining provides the exceptional performance, but the resulting model becomes a black box that is difficult to interpret and costly to inference. In this work, we explore an alternative strategy: using reasoning-capable LLMs to induce decision trees for small tabular datasets in agentic setup. We design a minimal set of tools for constructing, analyzing and manipulating decision trees. By using these tools, LLMs combine their prior knowledge with learning from data to create a lightweight decision tree that outperforms traditional CART on low-resource tabular problems. While a single decision tree does not outperform state-of-the-art black box models, it comes with a human-readable reasoning trace that can be checked for biases and data leaks. Furthermore, the reasoning-based LLM's creation process allows for additional human input: correcting biases or incorporating domain-specific intuition that is not captured in the data.

Problem

Research questions and friction points this paper is trying to address.

Using LLMs to create interpretable decision trees for tabular data

Addressing black-box limitations of tabular foundation models through reasoning

Combining prior knowledge with data learning for lightweight decision trees

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs induce decision trees using reasoning capabilities

Minimal toolset for constructing and analyzing decision trees

Combines prior knowledge with data learning for lightweight trees

🔎 Similar Papers

A Unified Approach to Extract Interpretable Rules from Tree Ensembles via Integer Programming