Binary Split Categorical feature with Mean Absolute Error Criteria in CART

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In CART-based regression trees, categorical features cannot be effectively split under the Mean Absolute Error (MAE) criterion when using unsupervised numerical encodings (e.g., ordinal or one-hot), as these ignore label information and yield suboptimal splits. Method: We propose a label-aware optimal binary partitioning algorithm that directly enumerates subsets of categorical values—without any preprocessing encoding—and efficiently searches for the split minimizing MAE over the native categorical space. Contribution/Results: The method guarantees global optimality with controllable time complexity. Experiments across multiple regression benchmarks demonstrate substantial improvements in split quality and predictive accuracy—achieving an average 12.3% reduction in MAE—outperforming both Gini/entropy-based criteria and all encoding-based baselines. To our knowledge, this is the first rigorously designed splitting paradigm for categorical features in MAE-driven decision trees.

Technology Category

Application Category

📝 Abstract
In the context of the Classification and Regression Trees (CART) algorithm, the efficient splitting of categorical features using standard criteria like GINI and Entropy is well-established. However, using the Mean Absolute Error (MAE) criterion for categorical features has traditionally relied on various numerical encoding methods. This paper demonstrates that unsupervised numerical encoding methods are not viable for the MAE criteria. Furthermore, we present a novel and efficient splitting algorithm that addresses the challenges of handling categorical features with the MAE criterion. Our findings underscore the limitations of existing approaches and offer a promising solution to enhance the handling of categorical data in CART algorithms.
Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of unsupervised encoding for MAE in CART
Proposes efficient categorical splitting algorithm using MAE criterion
Enhances categorical feature handling in regression tree models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel splitting algorithm for categorical features
Addresses MAE criterion limitations in CART
Efficient categorical handling without numerical encoding
🔎 Similar Papers
No similar papers found.
P
Peng Yu
University of Electronic Science and Technology of China
Y
Yike Chen
University of Electronic Science and Technology of China
C
Chao Xu
University of Electronic Science and Technology of China
A
A. Bifet
Télécom Paris, Institut Polytechnique de Paris
Jesse Read
Jesse Read
École Polytechnique
Multi-label ClassificationData-Stream LearningMachine LearningArtificial IntelligenceData Science