🤖 AI Summary
Formulaic alpha factor discovery in quantitative investing suffers from low search efficiency and poor interpretability. Method: This paper proposes the first large language model (LLM)-guided Monte Carlo Tree Search (MCTS) framework, wherein an LLM serves as a prior-knowledge injector to guide symbolic formula generation; financial backtesting feedback drives iterative refinement; and a frequent-subtree avoidance mechanism enhances search efficiency. Contribution/Results: Evaluated on real A-share data, our method significantly improves factor predictive power—increasing Information Coefficient (IC) mean by 23.6%—and out-of-sample Sharpe ratio (+0.41) over genetic programming and reinforcement learning baselines. Crucially, it preserves structural transparency and semantic interpretability of discovered factors. The core contribution is a backtest-driven, LLM-augmented, prunable symbolic search paradigm that jointly optimizes search efficiency, predictive performance, and human interpretability.
📝 Abstract
Alpha factor mining is pivotal in quantitative investment for identifying predictive signals from complex financial data. While traditional formulaic alpha mining relies on human expertise, contemporary automated methods, such as those based on genetic programming or reinforcement learning, often suffer from search inefficiency or yield poorly interpretable alpha factors. This paper introduces a novel framework that integrates Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS) to overcome these limitations. Our approach leverages the LLM's instruction-following and reasoning capability to iteratively generate and refine symbolic alpha formulas within an MCTS-driven exploration. A key innovation is the guidance of MCTS exploration by rich, quantitative feedback from financial backtesting of each candidate factor, enabling efficient navigation of the vast search space. Furthermore, a frequent subtree avoidance mechanism is introduced to bolster search efficiency and alpha factor performance. Experimental results on real-world stock market data demonstrate that our LLM-based framework outperforms existing methods by mining alphas with superior predictive accuracy, trading performance, and improved interpretability, while offering a more efficient solution for formulaic alpha mining.