PROPHET: An Inferable Future Forecasting Benchmark with Causal Intervened Likelihood Estimation

๐Ÿ“… 2025-04-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing event prediction benchmarks suffer from a critical flawโ€”many prediction tasks lack sufficient causal grounding, undermining rigorous evaluation of modelsโ€™ causal reasoning capabilities. Method: We introduce PROPHET, the first future event prediction benchmark with formal inferential guarantees. Its core innovation is the Causal Intervention Likelihood (CIL) metric, which systematically integrates causal inference into benchmark construction by rigorously filtering prediction tasks for causal identifiability. PROPHET is built via news corpus curation, event trend analysis, and expert annotation, and validated using retrieval-augmented generation (RAG) and likelihood estimation. Contribution/Results: Experiments demonstrate that CIL effectively discriminates inferentially valid tasks; even state-of-the-art LLMs exhibit substantial performance bottlenecks on causally identifiable predictions. PROPHET establishes a new, principled evaluation standard and empirical foundation for trustworthy event forecasting.

Technology Category

Application Category

๐Ÿ“ Abstract
Predicting future events stands as one of the ultimate aspirations of artificial intelligence. Recent advances in large language model (LLM)-based systems have shown remarkable potential in forecasting future events, thereby garnering significant interest in the research community. Currently, several benchmarks have been established to evaluate the forecasting capabilities by formalizing the event prediction as a retrieval-augmented generation (RAG) and reasoning task. In these benchmarks, each prediction question is answered with relevant retrieved news articles. However, because there is no consideration on whether the questions can be supported by valid or sufficient supporting rationales, some of the questions in these benchmarks may be inherently noninferable. To address this issue, we introduce a new benchmark, PROPHET, which comprises inferable forecasting questions paired with relevant news for retrieval. To ensure the inferability of the benchmark, we propose Causal Intervened Likelihood (CIL), a statistical measure that assesses inferability through causal inference. In constructing this benchmark, we first collected recent trend forecasting questions and then filtered the data using CIL, resulting in an inferable benchmark for event prediction. Through extensive experiments, we first demonstrate the validity of CIL and in-depth investigations into event prediction with the aid of CIL. Subsequently, we evaluate several representative prediction systems on PROPHET, drawing valuable insights for future directions.
Problem

Research questions and friction points this paper is trying to address.

Assessing inferability of future event forecasting questions
Developing a benchmark with causal inference validation
Evaluating prediction systems using inferable forecasting questions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Intervened Likelihood for inferability
Retrieval-augmented generation for event prediction
Filtering data using statistical measures
๐Ÿ”Ž Similar Papers
No similar papers found.
Zhengwei Tao
Zhengwei Tao
Peking University
AgentData Synthesis
Zhi Jin
Zhi Jin
Sun Yat-Sen University, Associate Professor
B
Bincheng Li
Guangzhou University
Xiaoying Bai
Xiaoying Bai
Tsinghua University
Software engineeringsoftware testingservice-oriented computingcloud computing
Haiyan Zhao
Haiyan Zhao
Peking University
C
Chengfeng Dou
Key Laboratory of High Confidence Software Technologies (PKU), MOE, China; School of Computer Science, Peking University
X
Xiancai Chen
Key Laboratory of High Confidence Software Technologies (PKU), MOE, China; School of Computer Science, Peking University
J
Jia Li
Key Laboratory of High Confidence Software Technologies (PKU), MOE, China; School of Computer Science, Peking University
Linyu Li
Linyu Li
Peking University
knowledge graphai4science
Chongyang Tao
Chongyang Tao
Associate Professor of Computer Science, Beihang University
Natural Language ProcessingDialogue SystemsInformation RetrievalData Intelligence