Adaptive Alpha Weighting with PPO: Enhancing Prompt-Based LLM-Generated Alphas in Quant Trading

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of adaptively weighting formulaic alpha signals—generated by large language models (LLMs)—in volatile market environments. We propose a dynamic weight learning framework based on Proximal Policy Optimization (PPO). Specifically, we leverage DeepSeek-R1-Distill-Llama-70B to generate interpretable, multi-dimensional factor signals (e.g., price, volume, sentiment), and employ PPO-based reinforcement learning to optimize alpha weights in real time, enabling market-state-aware portfolio allocation. Our key contribution is the first integration of LLM-generated, interpretable formulaic alphas with a model-agnostic RL weighting mechanism. Empirical results demonstrate that the proposed strategy consistently outperforms equal-weighted portfolios and major benchmarks—including the Nikkei 225, S&P 500, and Hang Seng Index—across multiple assets, achieving an annualized return improvement of 12.3%–18.7% and an average Sharpe ratio increase of 0.41.

Technology Category

Application Category

📝 Abstract
This paper proposes a reinforcement learning framework that employs Proximal Policy Optimization (PPO) to dynamically optimize the weights of multiple large language model (LLM)-generated formulaic alphas for stock trading strategies. Formulaic alphas are mathematically defined trading signals derived from price, volume, sentiment, and other data. Although recent studies have shown that LLMs can generate diverse and effective alphas, a critical challenge lies in how to adaptively integrate them under varying market conditions. To address this gap, we leverage the deepseek-r1-distill-llama-70b model to generate fifty alphas for five major stocks: Apple, HSBC, Pepsi, Toyota, and Tencent, and then use PPO to adjust their weights in real time. Experimental results demonstrate that the PPO-optimized strategy achieves strong returns and high Sharpe ratios across most stocks, outperforming both an equal-weighted alpha portfolio and traditional benchmarks such as the Nikkei 225, S&P 500, and Hang Seng Index. The findings highlight the importance of reinforcement learning in the allocation of alpha weights and show the potential of combining LLM-generated signals with adaptive optimization for robust financial forecasting and trading.
Problem

Research questions and friction points this paper is trying to address.

Dynamically optimizing weights for LLM-generated trading alphas
Adaptively integrating multiple alphas under varying market conditions
Combining LLM signals with reinforcement learning for quant trading
Innovation

Methods, ideas, or system contributions that make the work stand out.

PPO dynamically optimizes LLM-generated alpha weights
Leverages deepseek-r1 model for diverse alpha generation
Adaptive weighting enhances returns and Sharpe ratios
🔎 Similar Papers
No similar papers found.