🤖 AI Summary
To address the low efficiency, poor scalability, and rapid decay of manually engineered formulaic alphas (α), this paper proposes an automated α-generation framework integrating large language models (LLMs) with multimodal time-series modeling. Methodologically, prompt engineering guides an LLM to parse structured financial data—including stock prices, technical indicators, and firm-level sentiment scores—to generate interpretable, adaptive symbolic α expressions; these expressions serve as high-order features for downstream forecasting models (e.g., Transformer, LSTM, TCN, SVR, and Random Forest). Our key contributions are: (i) the first direct use of LLM-generated formulaic α for stock price prediction; and (ii) incorporation of sentiment-aware reasoning and natural language inference to enhance α’s semantic validity and interpretability. Empirical results demonstrate that the proposed α consistently improves predictive accuracy across all evaluated models, reducing average MAE by 12.7%, while providing human-understandable, decision-relevant rationales.
📝 Abstract
Traditionally, traders and quantitative analysts address alpha decay by manually crafting formulaic alphas, mathematical expressions that identify patterns or signals in financial data, through domain expertise and trial-and-error. This process is often time-consuming and difficult to scale. With recent advances in large language models (LLMs), it is now possible to automate the generation of such alphas by leveraging the reasoning capabilities of LLMs. This paper introduces a novel framework that integrates a prompt-based LLM with a Transformer model for stock price prediction. The LLM first generates diverse and adaptive alphas using structured inputs such as historical stock features (Close, Open, High, Low, Volume), technical indicators, sentiment scores of both target and related companies. These alphas, instead of being used directly for trading, are treated as high-level features that capture complex dependencies within the financial data. To evaluate the effectiveness of these LLM-generated formulaic alphas, the alpha features are then fed into prediction models such as Transformer, LSTM, TCN, SVR, and Random Forest to forecast future stock prices. Experimental results demonstrate that the LLM-generated alphas significantly improve predictive accuracy. Moreover, the accompanying natural language reasoning provided by the LLM enhances the interpretability and transparency of the predictions, supporting more informed financial decision-making.