The When and How of Target Variable Transformations

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Target variable transformation—a critical yet long-overlooked preprocessing step in machine learning regression—lacks principled guidance, leading to suboptimal model performance. Method: We systematically investigate its impact mechanisms via empirical case studies, statistical diagnostics (e.g., residual distribution tests, heteroscedasticity detection), and domain-informed heuristic reasoning. Contribution/Results: We propose the first actionable decision framework—“when and how to transform”—grounded in empirically derived applicability criteria (e.g., skewness, scale imbalance, nonlinear effects, temporal trend contamination). We further formulate generalizable heuristics mapping common data issues (e.g., population-size bias, inflation drift, score compression) to optimal transformations (e.g., log, Box–Cox, quantile normalization). Extensive experiments demonstrate substantial improvements in model fit accuracy and out-of-sample stability, thereby bridging a key theoretical and practical gap in ML pipeline design—specifically, target-variable preprocessing.

Technology Category

Application Category

📝 Abstract
The machine learning pipeline typically involves the iterative process of (1) collecting the data, (2) preparing the data, (3) learning a model, and (4) evaluating a model. Practitioners recognize the importance of the data preparation phase in terms of its impact on the ability to learn accurate models. In this regard, significant attention is often paid to manipulating the feature set (e.g., selection, transformations, dimensionality reduction). A point that is less well appreciated is that transformations on the target variable can also have a large impact on whether it is possible to learn a suitable model. These transformations may include accounting for subject-specific biases (e.g., in how someone uses a rating scale), contexts (e.g., population size effects), and general trends (e.g., inflation). However, this point has received a much more cursory treatment in the existing literature. The goal of this paper is three-fold. First, we aim to highlight the importance of this problem by showing when transforming the target variable has been useful in practice. Second, we will provide a set of generic ``rules of thumb'' that indicate situations when transforming the target variable may be needed. Third, we will discuss which transformations should be considered in a given situation.
Problem

Research questions and friction points this paper is trying to address.

Highlight importance of target variable transformations in ML
Provide rules for when target transformations are needed
Recommend suitable transformations for different scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transform target variable for better model accuracy
Provide rules for when target transformations are needed
Suggest suitable transformations based on context
🔎 Similar Papers
No similar papers found.