From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

260K/year

🤖 AI Summary

This work addresses critical limitations in existing large language model (LLM)-based Alpha factor discovery methods, which rely on prompt-level generate–evaluate–feedback loops prone to context bloat, high inference costs, information dilution, and feedback drift, often yielding structurally redundant and homogeneous factor expressions that hinder exploration. To overcome these issues, we propose QuantEvolver, a novel framework that formulates factor discovery as a policy learning problem. By applying reinforcement fine-tuning, QuantEvolver enables a Miner LLM to internalize historical optimization experience and consistently generate high-quality, diverse Alpha factors. The framework incorporates a diversity-complementarity reward mechanism, leverages a factor-specific domain-specific language (Factor DSL), integrates mechanism-based backtesting, seed factors, temporal window task design, and maintains a dynamically updated factor repository. Evaluated on three real-world market benchmarks, QuantEvolver significantly outperforms current LLM-based approaches, achieving sustained improvements in primary metrics and producing a factor pool of higher quality and greater complementarity.

📝 Abstract

Modern quantitative trading increasingly relies on systematic models to extract predictive signals from large-scale financial data, where alpha factor discovery plays a central role in transforming market observations into tradable signals. Recent LLM-based methods have shown promise in automating factor generation, but most of them still rely on prompt-level generation--evaluation--feedback loops for iterative optimization. As the loop becomes longer, repeatedly appended historical candidates and feedback can cause context explosion, increase inference cost, dilute useful information, and introduce feedback drift. Moreover, these methods often depend on very large LLMs whose stable generation preferences may lead to structurally similar expressions, redundant candidates, and search stagnation. To address these limitations, we propose \textsc{QuantEvolver}, a self-evolving alpha factor discovery framework based on reinforcement fine-tuning. Instead of accumulating feedback in the prompt, \textsc{QuantEvolver} converts executable quantitative evaluation into policy updates, enabling a Miner LLM to internalize historical optimization experience through parameter learning. Specifically, \textsc{QuantEvolver} constructs high-quality seed factors, builds diverse seed--time-window training tasks, generates executable Factor DSL expressions, evaluates them through Regime Backtest, and optimizes the Miner LLM with Diversity-Complementarity Reward. During training, high-quality factors are continuously accumulated in a Mined Factor Database, which serves as the final discovered factor library. Extensive experiments on three realistic market benchmarks demonstrate the effectiveness of \textsc{QuantEvolver}, which consistently improves the primary evaluation metric of each task over existing LLM-based alpha factor discovery baselines, produces higher-quality and more complementary factor pools.

Problem

Research questions and friction points this paper is trying to address.

alpha factor discovery

feedback loops

context explosion

search stagnation

LLM-based generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement fine-tuning

alpha factor discovery

self-evolving framework