🤖 AI Summary
Existing large language models (LLMs) exhibit limited reliability in event forecasting due to noisy and sparse training data, knowledge cutoff, and oversimplified reward signals. Method: We propose a forecasting-optimized LLM framework integrating (1) Bayesian network modeling grounded in hypothetical events to enhance uncertainty-aware reasoning; (2) counterfactual event data augmentation to mitigate data sparsity; and (3) multi-source auxiliary reward signals—synthesizing market indicators, public event repositories, and temporally crawled information—within a reinforcement learning–guided, reasoning-aware training paradigm. Contribution/Results: Our end-to-end trainable framework achieves significant improvements in both accuracy and calibration for long-horizon, low-frequency critical events. It delivers a scalable, interpretable, and AI-driven technical foundation for anticipatory societal decision-making.
📝 Abstract
Many recent papers have studied the development of superforecaster-level event forecasting LLMs. While methodological problems with early studies cast doubt on the use of LLMs for event forecasting, recent studies with improved evaluation methods have shown that state-of-the-art LLMs are gradually reaching superforecaster-level performance, and reinforcement learning has also been reported to improve future forecasting. Additionally, the unprecedented success of recent reasoning models and Deep Research-style models suggests that technology capable of greatly improving forecasting performance has been developed. Therefore, based on these positive recent trends, we argue that the time is ripe for research on large-scale training of superforecaster-level event forecasting LLMs. We discuss two key research directions: training methods and data acquisition. For training, we first introduce three difficulties of LLM-based event forecasting training: noisiness-sparsity, knowledge cut-off, and simple reward structure problems. Then, we present related ideas to mitigate these problems: hypothetical event Bayesian networks, utilizing poorly-recalled and counterfactual events, and auxiliary reward signals. For data, we propose aggressive use of market, public, and crawling datasets to enable large-scale training and evaluation. Finally, we explain how these technical advances could enable AI to provide predictive intelligence to society in broader areas. This position paper presents promising specific paths and considerations for getting closer to superforecaster-level AI technology, aiming to call for researchers' interest in these directions.