Decision Trees That Remember: Gradient-Based Learning of Recurrent Decision Trees with Memory

📅 2025-02-06

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Traditional decision trees struggle to natively model temporal dependencies in sequential data, typically relying on hand-crafted lagged features—limiting their capacity to capture long-range dependencies. To address this, we propose ReMeDe Trees: the first recursive decision tree architecture incorporating an RNN-style internal memory mechanism, enabling end-to-end differentiable training. Our key innovation jointly optimizes hard axis-aligned decision routing and memory state evolution, preserving interpretability while modeling long-term temporal dynamics. The method employs gradient descent to co-optimize prediction outputs and hidden state evolution, eliminating the need for predefined time windows or manual feature engineering. On synthetic sequential benchmarks, ReMeDe Trees significantly outperform conventional decision trees augmented with feature engineering, demonstrating superior modeling capability and generalization—particularly in tasks demanding long-range dependency capture.

Technology Category

Application Category

📝 Abstract

Neural architectures such as Recurrent Neural Networks (RNNs), Transformers, and State-Space Models have shown great success in handling sequential data by learning temporal dependencies. Decision Trees (DTs), on the other hand, remain a widely used class of models for structured tabular data but are typically not designed to capture sequential patterns directly. Instead, DT-based approaches for time-series data often rely on feature engineering, such as manually incorporating lag features, which can be suboptimal for capturing complex temporal dependencies. To address this limitation, we introduce ReMeDe Trees, a novel recurrent DT architecture that integrates an internal memory mechanism, similar to RNNs, to learn long-term dependencies in sequential data. Our model learns hard, axis-aligned decision rules for both output generation and state updates, optimizing them efficiently via gradient descent. We provide a proof-of-concept study on synthetic benchmarks to demonstrate the effectiveness of our approach.

Problem

Research questions and friction points this paper is trying to address.

Enhancing Decision Trees for sequential data

Integrating memory mechanism in Decision Trees

Learning long-term dependencies without feature engineering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Recurrent Decision Trees

Internal memory mechanism

Gradient-based optimization

🔎 Similar Papers

No similar papers found.