FinDeepForecast: A Live Multi-Agent System for Benchmarking Deep Research Agents in Financial Forecasting

πŸ“… 2026-01-08
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing evaluation frameworks struggle to comprehensively assess the real-time, forward-looking reasoning capabilities of deep research agents in high-stakes financial domains. To address this gap, this work proposes an end-to-end multi-agent system featuring a novel dual-track task taxonomy that dynamically generates periodic and aperiodic financial forecasting tasks at both firm and macroeconomic levels. The study introduces OpenFinArena, the first weekly online benchmark grounded in authentic financial research tasks. Evaluated over ten weeks across 1,314 publicly listed companies in eight economies, large language model–based agents significantly outperformed strong baselines yet still fell short of professional-level analytical performance. The benchmark and associated leaderboard have been publicly released.

Technology Category

Application Category

πŸ“ Abstract
Deep Research (DR) Agents powered by advanced Large Language Models (LLMs) have fundamentally shifted the paradigm for completing complex research tasks. Yet, a comprehensive and live evaluation of their forecasting performance on real-world, research-oriented tasks in high-stakes domains (e.g., finance) remains underexplored. We introduce FinDeepForecast, the first live, end-to-end multi-agent system for automatically evaluating DR agents by continuously generating research-oriented financial forecasting tasks. This system is equipped with a dual-track taxonomy, enabling the dynamic generation of recurrent and non-recurrent forecasting tasks at both corporate and macro levels. With this system, we generate FinDeepForecastBench, a weekly evaluation benchmark over a ten-week horizon, encompassing 8 global economies and 1,314 listed companies, and evaluate 13 representative methods. Extensive experiments show that, while DR agents consistently outperform strong baselines, their performance still falls short of genuine forward-looking financial reasoning. We expect the proposed FinDeepForecast system to consistently facilitate future advancements of DR agents in research-oriented financial forecasting tasks. The benchmark and leaderboard are publicly available on the OpenFinArena Platform.
Problem

Research questions and friction points this paper is trying to address.

Deep Research Agents
Financial Forecasting
Live Benchmarking
LLM-based Evaluation
Research-Oriented Tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent system
financial forecasting
deep research agents
live benchmark
LLM-based evaluation
πŸ”Ž Similar Papers
No similar papers found.