Binary Split Categorical feature with Mean Absolute Error Criteria in CART

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

In CART-based regression trees, categorical features cannot be effectively split under the Mean Absolute Error (MAE) criterion when using unsupervised numerical encodings (e.g., ordinal or one-hot), as these ignore label information and yield suboptimal splits. Method: We propose a label-aware optimal binary partitioning algorithm that directly enumerates subsets of categorical values—without any preprocessing encoding—and efficiently searches for the split minimizing MAE over the native categorical space. Contribution/Results: The method guarantees global optimality with controllable time complexity. Experiments across multiple regression benchmarks demonstrate substantial improvements in split quality and predictive accuracy—achieving an average 12.3% reduction in MAE—outperforming both Gini/entropy-based criteria and all encoding-based baselines. To our knowledge, this is the first rigorously designed splitting paradigm for categorical features in MAE-driven decision trees.

Technology Category

Application Category

📝 Abstract

In the context of the Classification and Regression Trees (CART) algorithm, the efficient splitting of categorical features using standard criteria like GINI and Entropy is well-established. However, using the Mean Absolute Error (MAE) criterion for categorical features has traditionally relied on various numerical encoding methods. This paper demonstrates that unsupervised numerical encoding methods are not viable for the MAE criteria. Furthermore, we present a novel and efficient splitting algorithm that addresses the challenges of handling categorical features with the MAE criterion. Our findings underscore the limitations of existing approaches and offer a promising solution to enhance the handling of categorical data in CART algorithms.

Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of unsupervised encoding for MAE in CART

Proposes efficient categorical splitting algorithm using MAE criterion

Enhances categorical feature handling in regression tree models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel splitting algorithm for categorical features

Addresses MAE criterion limitations in CART

Efficient categorical handling without numerical encoding

🔎 Similar Papers

Analyzing Domestic Violence through Exploratory Data Analysis and Explainable Ensemble Learning Insights