AIA Forecaster: Technical Report

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of enhancing large language models’ (LLMs) judgmental forecasting capabilities on unstructured data to match human superforecasters. We propose a proxy-collaborative forecasting architecture integrating agent-driven news retrieval, supervised multi-source forecast ensembling, and behaviorally informed statistical calibration—explicitly modeling cognitive biases. Evaluated on ForecastBench, the first large-scale, verifiable forecasting benchmark, our framework achieves performance parity with human superforecasters—a first for LLMs—and demonstrates incremental information value beyond liquid prediction market consensus. The core contribution lies in the deep coupling of multi-agent collaboration with bias-aware calibration, markedly improving forecast stability and accuracy. This work establishes a novel paradigm and empirical benchmark for AI-augmented, expert-level forecasting.

Technology Category

Application Category

📝 Abstract
This technical report describes the AIA Forecaster, a Large Language Model (LLM)-based system for judgmental forecasting using unstructured data. The AIA Forecaster approach combines three core elements: agentic search over high-quality news sources, a supervisor agent that reconciles disparate forecasts for the same event, and a set of statistical calibration techniques to counter behavioral biases in large language models. On the ForecastBench benchmark (Karger et al., 2024), the AIA Forecaster achieves performance equal to human superforecasters, surpassing prior LLM baselines. In addition to reporting on ForecastBench, we also introduce a more challenging forecasting benchmark sourced from liquid prediction markets. While the AIA Forecaster underperforms market consensus on this benchmark, an ensemble combining AIA Forecaster with market consensus outperforms consensus alone, demonstrating that our forecaster provides additive information. Our work establishes a new state of the art in AI forecasting and provides practical, transferable recommendations for future research. To the best of our knowledge, this is the first work that verifiably achieves expert-level forecasting at scale.
Problem

Research questions and friction points this paper is trying to address.

Combines agentic search and supervisor agents for judgmental forecasting
Counters behavioral biases in LLMs through statistical calibration techniques
Achieves human superforecaster performance and provides additive market information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic search over high-quality news sources
Supervisor agent reconciles disparate event forecasts
Statistical calibration counters LLM behavioral biases
🔎 Similar Papers
No similar papers found.
Rohan Alur
Rohan Alur
PhD Student at MIT
Machine LearningStatisticsHuman-AI CollaborationAlgorithmic Fairness
B
Bradly C. Stadie
Bridgewater AIA Labs, New York, NY
Daniel Kang
Daniel Kang
UIUC
Computer Science
Ryan Chen
Ryan Chen
Northwestern University
M
Matt McManus
Bridgewater AIA Labs, New York, NY
M
Michael Rickert
Bridgewater AIA Labs, New York, NY
Tyler Lee
Tyler Lee
Bridgewater AIA Labs, New York, NY
M
Michael Federici
Bridgewater AIA Labs, New York, NY
R
Richard Zhu
Bridgewater AIA Labs, New York, NY
D
Dennis Fogerty
Bridgewater AIA Labs, New York, NY
H
Hayley Williamson
Bridgewater AIA Labs, New York, NY
N
Nina Lozinski
Bridgewater AIA Labs, New York, NY
A
Aaron Linsky
Bridgewater AIA Labs, New York, NY
J
J. Sekhon
Bridgewater AIA Labs, New York, NY