Navigating the Alpha Jungle: An LLM-Powered MCTS Framework for Formulaic Factor Mining

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Formulaic alpha factor discovery in quantitative investing suffers from low search efficiency and poor interpretability. Method: This paper proposes the first large language model (LLM)-guided Monte Carlo Tree Search (MCTS) framework, wherein an LLM serves as a prior-knowledge injector to guide symbolic formula generation; financial backtesting feedback drives iterative refinement; and a frequent-subtree avoidance mechanism enhances search efficiency. Contribution/Results: Evaluated on real A-share data, our method significantly improves factor predictive power—increasing Information Coefficient (IC) mean by 23.6%—and out-of-sample Sharpe ratio (+0.41) over genetic programming and reinforcement learning baselines. Crucially, it preserves structural transparency and semantic interpretability of discovered factors. The core contribution is a backtest-driven, LLM-augmented, prunable symbolic search paradigm that jointly optimizes search efficiency, predictive performance, and human interpretability.

Technology Category

Application Category

📝 Abstract
Alpha factor mining is pivotal in quantitative investment for identifying predictive signals from complex financial data. While traditional formulaic alpha mining relies on human expertise, contemporary automated methods, such as those based on genetic programming or reinforcement learning, often suffer from search inefficiency or yield poorly interpretable alpha factors. This paper introduces a novel framework that integrates Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS) to overcome these limitations. Our approach leverages the LLM's instruction-following and reasoning capability to iteratively generate and refine symbolic alpha formulas within an MCTS-driven exploration. A key innovation is the guidance of MCTS exploration by rich, quantitative feedback from financial backtesting of each candidate factor, enabling efficient navigation of the vast search space. Furthermore, a frequent subtree avoidance mechanism is introduced to bolster search efficiency and alpha factor performance. Experimental results on real-world stock market data demonstrate that our LLM-based framework outperforms existing methods by mining alphas with superior predictive accuracy, trading performance, and improved interpretability, while offering a more efficient solution for formulaic alpha mining.
Problem

Research questions and friction points this paper is trying to address.

Automating formulaic alpha factor mining in finance
Improving search efficiency and interpretability of alpha factors
Combining LLMs with MCTS for better predictive accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM and MCTS integration for alpha mining
Financial backtesting guides MCTS exploration
Frequent subtree avoidance boosts efficiency
🔎 Similar Papers
No similar papers found.
Y
Yu Shi
Institute for Interdisciplinary Information Sciences, Tsinghua University
Yitong Duan
Yitong Duan
Institute for Interdisciplinary Information Sciences, Tsinghua University
Machine Learning
J
Jian Li
Institute for Interdisciplinary Information Sciences, Tsinghua University