π€ AI Summary
This paper addresses pervasive overfitting and look-ahead bias in algorithmic trading by proposing the first forward-validation framework that simultaneously ensures rigorous statistical validation and full interpretability. Methodologically: (1) it implements a rolling, information-set-constrained forward-testing paradigm across 34 independent periods; (2) integrates hypothesis-driven signal generation (expressed in natural language), market microstructure modeling, and realistic trading constraints; and (3) supports plug-and-play integration of novel hypothesis generators (e.g., LLMs), reinforcement learning, and OHLCV-based feature engineering. Empirical evaluation on 100 U.S. equities from 2015β2024 yields an annualized return of 0.55%, Sharpe ratio of 0.33, maximum drawdown of β2.76%, and beta of just 0.058βdemonstrating robust profitability during high-volatility regimes. A non-significant p-value of 0.34 underscores result honesty and reproducibility. The core contribution is establishing a new validation standard that jointly prioritizes statistical rigor and model interpretability.
π Abstract
We develop a rigorous walk-forward validation framework for algorithmic trading designed to mitigate overfitting and lookahead bias. Our methodology combines interpretable hypothesis-driven signal generation with reinforcement learning and strict out-of-sample testing. The framework enforces strict information set discipline, employs rolling window validation across 34 independent test periods, maintains complete interpretability through natural language hypothesis explanations, and incorporates realistic transaction costs and position constraints. Validating five market microstructure patterns across 100 US equities from 2015 to 2024, the system yields modest annualized returns (0.55%, Sharpe ratio 0.33) with exceptional downside protection (maximum drawdown -2.76%) and market-neutral characteristics (beta = 0.058). Performance exhibits strong regime dependence, generating positive returns during high-volatility periods (0.60% quarterly, 2020-2024) while underperforming in stable markets (-0.16%, 2015-2019). We report statistically insignificant aggregate results (p-value 0.34) to demonstrate a reproducible, honest validation protocol that prioritizes interpretability and extends naturally to advanced hypothesis generators, including large language models. The key empirical finding reveals that daily OHLCV-based microstructure signals require elevated information arrival and trading activity to function effectively. The framework provides complete mathematical specifications and open-source implementation, establishing a template for rigorous trading system evaluation that addresses the reproducibility crisis in quantitative finance research. For researchers, practitioners, and regulators, this work demonstrates that interpretable algorithmic trading strategies can be rigorously validated without sacrificing transparency or regulatory compliance.