🤖 AI Summary
This study addresses the lack of systematic evaluation of stationarity-inducing transformations across diverse non-stationary time series. The authors construct synthetic datasets encompassing trend, seasonality, and heteroskedasticity, complemented by real-world airport passenger flow data, and conduct 3,528 controlled experiments evaluating 14 transformation methods across seven forecasting models and three prediction horizons. Innovatively, stationarity is assessed via consensus from ten statistical tests, and mediation analysis elucidates underlying mechanisms. Results challenge the common assumption that transformations universally improve forecasts: matched transformations enhance accuracy in only 18% of cases; log or Box–Cox transformations are effective for heteroskedastic data (60–65% of cases); and differencing consistently degrades performance on linear-trend series.
📝 Abstract
Stationarity transformations are standard preprocessing in time series forecasting, yet their actual impact on accuracy across different non-stationarity types and model families has received little controlled evaluation. We construct synthetic datasets with known properties - trend, seasonality, heteroscedasticity, and combinations - and apply fourteen transformation configurations across seven models and three forecast horizons (3,528 experiments). Stationarity is quantified via consensus ratios from ten statistical tests, and each transform-dataset pair is classified as matched or mismatched based on whether the transform targets the dataset's known non-stationarity. For matched pairs, transforms improve forecasts only 18% of the time. The primary exception is variance stabilization: log and Box-Cox on heteroscedastic data improve accuracy in 60-65% of cases. Differencing a linear-trend series - a textbook use case - worsens forecasts in all cases tested. Mediation analysis confirms that while transforms achieve trend stationarity, this does not translate into lower forecast error; the mechanism is signal attenuation. Real-world validation on TSA airport passenger data corroborates these findings. Our results suggest transformation selection should be guided by empirical out-of-sample evaluation rather than theoretical stationarity assumptions.