RETuning: Upgrading Inference-Time Scaling for Stock Movement Prediction with Large Language Models

πŸ“… 2025-10-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Contemporary large language models (LLMs) exhibit two critical limitations in stock price forecasting: excessive reliance on analyst opinions and insufficient independent reasoning capability; and inability to effectively weigh contradictory evidence, resulting in unreliable predictions. To address these issues, we propose RETuningβ€”a novel reflective evidence tuning framework that, for the first time, dynamically constructs analytical reasoning structures during inference. RETuning integrates heterogeneous long-context inputs (up to 32K tokens), including stock prices, news articles, research reports, fundamental data, and macroeconomic indicators, and performs evidence scoring and adaptive weighting. Crucially, it eliminates dependence on prior analyst views, enabling cold-start adaptation and out-of-distribution generalization. Evaluated on a comprehensive A-share dataset comprising over 200,000 samples, RETuning significantly improves accuracy across the three-class prediction task (upward/downward/hold), maintains robust inference stability for over six months, and demonstrates strong transferability to unseen stocks.

Technology Category

Application Category

πŸ“ Abstract
Recently, large language models (LLMs) have demonstrated outstanding reasoning capabilities on mathematical and coding tasks. However, their application to financial tasks-especially the most fundamental task of stock movement prediction-remains underexplored. We study a three-class classification problem (up, hold, down) and, by analyzing existing reasoning responses, observe that: (1) LLMs follow analysts' opinions rather than exhibit a systematic, independent analytical logic (CoTs). (2) LLMs list summaries from different sources without weighing adversarial evidence, yet such counterevidence is crucial for reliable prediction. It shows that the model does not make good use of its reasoning ability to complete the task. To address this, we propose Reflective Evidence Tuning (RETuning), a cold-start method prior to reinforcement learning, to enhance prediction ability. While generating CoT, RETuning encourages dynamically constructing an analytical framework from diverse information sources, organizing and scoring evidence for price up or down based on that framework-rather than on contextual viewpoints-and finally reflecting to derive the prediction. This approach maximally aligns the model with its learned analytical framework, ensuring independent logical reasoning and reducing undue influence from context. We also build a large-scale dataset spanning all of 2024 for 5,123 A-share stocks, with long contexts (32K tokens) and over 200K samples. In addition to price and news, it incorporates analysts' opinions, quantitative reports, fundamental data, macroeconomic indicators, and similar stocks. Experiments show that RETuning successfully unlocks the model's reasoning ability in the financial domain. Inference-time scaling still works even after 6 months or on out-of-distribution stocks, since the models gain valuable insights about stock movement prediction.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' independent reasoning for stock movement prediction
Reducing reliance on analysts' opinions in financial forecasting
Improving adversarial evidence integration in stock price analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic analytical framework construction from diverse sources
Evidence scoring for price movement based on framework
Cold-start method enhancing reasoning before reinforcement learning
πŸ”Ž Similar Papers
No similar papers found.