🤖 AI Summary
This paper addresses the fundamental trade-off between model complexity and rolling window length in stock return forecasting under nonstationarity—where complex models reduce misspecification error but exacerbate nonstationarity-induced instability. We propose a tournament-based model selection framework that jointly optimizes model class and rolling window length using nonstationary validation data. Theoretically, our approach achieves balanced control over three sources of error: misspecification, estimation variance, and nonstationarity. Empirically, evaluated on 17 industry portfolios, it improves out-of-sample R² by 14–23% on average. Performance is especially pronounced during severe recessions—including the Gulf War and the 2008 financial crisis—where our strategy delivers 31% higher cumulative returns than the benchmark and, for the first time in extreme market turmoil, achieves positive out-of-sample R² (while the benchmark yields negative R²).
📝 Abstract
We investigate machine learning models for stock return prediction in non-stationary environments, revealing a fundamental nonstationarity-complexity tradeoff: complex models reduce misspecification error but require longer training windows that introduce stronger non- stationarity. We resolve this tension with a novel model selection method that jointly optimizes model class and training window size using a tournament procedure that adaptively evaluates candidates on non-stationary validation data. Our theoretical analysis demonstrates that this approach balances misspecification error, estimation variance, and non-stationarity, performing close to the best model in hindsight. Applying our method to 17 industry portfolio returns, we consistently outperform standard rolling-window benchmarks, improving out-of-sample $R^2$ by 14-23% on average. During NBER- designated recessions, improvements are substantial: our method achieves positive $R^2$ during the Gulf War recession while benchmarks are negative, and improves $R^2$ in absolute terms by at least 80bps during the 2001 recession as well as superior performance during the 2008 Financial Crisis. Economically, a trading strategy based on our selected model generates 31% higher cumulative returns averaged across the industries.