Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing financial forecasting approaches often decouple time series modeling from linguistic reasoning, leading to inconsistencies between qualitative analysis and quantitative predictions. This work proposes StockR1, a novel framework that introduces structured prediction actions as a bridge between language models and time series forecasting: it first generates verifiable market views and then decodes future price trajectories conditioned on these actions. Leveraging a tool-augmented architecture, a temporal decoder, and a consistency reward mechanism weighted by uncertainty, the model is jointly optimized via reinforcement learning to enhance answer validity, predictive accuracy, and alignment between actions and time series outputs. Evaluated on a decade-long large-scale financial benchmark, StockR1 improves reasoning accuracy by 17.7% (4B) and 25.9% (8B) over strong baselines, significantly outperforming both general-purpose large language models and dedicated time series models.

📝 Abstract

Financial markets are characterized by extreme non-stationarity, low signal-to-noise ratios, and strong dependence on external information such as news, company fundamentals, and macroeconomic signals. Yet, existing approaches either abstract time-series into text or decouple forecasting from language-based reasoning, leading to a fundamental mismatch between qualitative reasoning and quantitative outcomes. To address this, we introduce StockR1, a time-series-enhanced LLM that unifies stock forecasting and financial reasoning through a verifiable forecast action. Based on a tool-call design, the model first emits a forecast action, which is a structured and interpretable representation of its qualitative market outlook. It then invokes a time-series decoder conditioned on this action to generate distributional future trajectories, leading to more informed question answering and financial reasoning. We optimize the full pipeline with reinforcement learning, where rewards jointly reflect answer validity, forecast accuracy, and consistency between generated actions and observed time-series dynamics. In addition, rewards are reweighted by a sample-level uncertainty scalar, encouraging the model to accommodate varying uncertainty in market dynamics. We evaluate StockR1 on financial question answering and stock forecasting over a large-scale 10-year benchmark. Our method consistently outperforms time-series baselines and general-purpose LLMs, improving reasoning accuracy by 17.7% (4B) and 25.9% (8B). These findings demonstrate that structuring the forecast actions establishes a powerful synergy between language reasoning and temporal prediction, enabling LLMs to reason through verifiable, interpretable, and numerically grounded decisions.

Problem

Research questions and friction points this paper is trying to address.

financial LLMs

forecast reasoning

time-series forecasting

reasoning-forecast mismatch

non-stationary markets

Innovation

Methods, ideas, or system contributions that make the work stand out.

verifiable forecast action

consistency-grounded reinforcement learning

time-series-enhanced LLM