HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting?

📅 2024-06-20

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing event prediction research primarily focuses on single-step next-event forecasting; long-horizon, multi-step joint prediction (of both time and type) remains unexplored. Method: We introduce HoTPP, the first long-horizon temporal point process (TPP) benchmark, covering critical domains such as finance and healthcare, and propose T-mAP—a theoretically grounded metric for systematically evaluating models’ long-term predictive capability. Contribution/Results: Empirical analysis reveals that mainstream marked temporal point process (MTPP) models consistently underperform simple baselines (e.g., Poisson or historical frequency) in long-horizon forecasting and suffer from mode collapse. We quantitatively identify autoregressive sampling and intensity-based loss functions as key bottlenecks limiting long-range performance. To foster reproducibility and advancement, we open-source a unified evaluation framework, implementations of state-of-the-art models, and comprehensive experimental results—paving the way from short-horizon “myopic” modeling toward genuine long-horizon event prediction.

Technology Category

Application Category

📝 Abstract

Forecasting multiple future events within a given time horizon is essential for applications in finance, retail, social networks, and healthcare. Marked Temporal Point Processes (MTPP) provide a principled framework to model both the timing and labels of events. However, most existing research focuses on predicting only the next event, leaving long-horizon forecasting largely underexplored. To address this gap, we introduce HoTPP, the first benchmark specifically designed to rigorously evaluate long-horizon predictions. We identify shortcomings in widely used evaluation metrics, propose a theoretically grounded T-mAP metric, present strong statistical baselines, and offer efficient implementations of popular models. Our empirical results demonstrate that modern MTPP approaches often underperform simple statistical baselines. Furthermore, we analyze the diversity of predicted sequences and find that most methods exhibit mode collapse. Finally, we analyze the impact of autoregression and intensity-based losses on prediction quality, and outline promising directions for future research. The HoTPP source code, hyperparameters, and full evaluation results are available on GitHub.

Problem

Research questions and friction points this paper is trying to address.

Evaluating long-horizon event forecasting in MTPP models

Addressing shortcomings in existing evaluation metrics for predictions

Analyzing mode collapse and prediction diversity in current methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces HoTPP benchmark for long-horizon forecasting

Proposes T-mAP metric for rigorous evaluation

Analyzes autoregression and intensity-based losses impact

🔎 Similar Papers

DeTPP: Leveraging Object Detection for Robust Long-Horizon Event Prediction