🤖 AI Summary
This paper identifies a fundamental limitation of machine learning (ML) in approximating the true data-generating process under finite samples, introducing the “Learning Limit Gap” (LLG)—a universal theoretical lower bound quantifying the irreducible bias between empirical fit and population-level performance. Methodologically, it establishes the first unified LLG framework, embedding it within the Hansen–Jagannathan asset pricing bound and systematically integrating statistical learning theory, asymptotic inference, and high-dimensional financial econometrics. Empirically, applying the framework to diverse financial variables—including returns, credit spreads, and valuation ratios—reveals that LLG is substantially large in practice, causing conventional ML methods to severely underestimate true predictability. After LLG correction, R² estimates converge markedly toward their population counterparts, while out-of-sample prediction robustness and economic interpretability improve significantly.
📝 Abstract
Machine learning (ML) methods are highly flexible, but their ability to approximate the true data-generating process is fundamentally constrained by finite samples. We characterize a universal lower bound, the Limits-to-Learning Gap (LLG), quantifying the unavoidable discrepancy between a model's empirical fit and the population benchmark. Recovering the true population $R^2$, therefore, requires correcting observed predictive performance by this bound. Using a broad set of variables, including excess returns, yields, credit spreads, and valuation ratios, we find that the implied LLGs are large. This indicates that standard ML approaches can substantially understate true predictability in financial data. We also derive LLG-based refinements to the classic Hansen and Jagannathan (1991) bounds, analyze implications for parameter learning in general-equilibrium settings, and show that the LLG provides a natural mechanism for generating excess volatility.