Large Language Models for Predictive Analysis: How Far Are They?

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Prior work lacks systematic evaluation of large language models’ (LLMs) practical capabilities in predictive analytics tasks. Method: We introduce PredictiQ—a novel, comprehensive benchmark comprising 1,130 real-world, data-driven prediction questions spanning eight domains—and propose a three-dimensional evaluation framework integrating textual understanding, executable code generation, and logical consistency verification. Empirical assessment is conducted across 12 state-of-the-art LLMs, leveraging program synthesis, natural language inference, and structured output validation. Results: Current LLMs exhibit weak generalization, high code error rates, and deficient causal reasoning in prediction tasks, achieving an overall accuracy below 40%. These findings reveal fundamental limitations in their reliability for rigorous predictive analysis. This work establishes the first domain-specific benchmark and methodological foundation for evaluating LLMs in predictive analytics.

Technology Category

Application Category

📝 Abstract

Predictive analysis is a cornerstone of modern decision-making, with applications in various domains. Large Language Models (LLMs) have emerged as powerful tools in enabling nuanced, knowledge-intensive conversations, thus aiding in complex decision-making tasks. With the burgeoning expectation to harness LLMs for predictive analysis, there is an urgent need to systematically assess their capability in this domain. However, there is a lack of relevant evaluations in existing studies. To bridge this gap, we introduce the extbf{PredictiQ} benchmark, which integrates 1130 sophisticated predictive analysis queries originating from 44 real-world datasets of 8 diverse fields. We design an evaluation protocol considering text analysis, code generation, and their alignment. Twelve renowned LLMs are evaluated, offering insights into their practical use in predictive analysis. Generally, we believe that existing LLMs still face considerable challenges in conducting predictive analysis. See href{https://github.com/Cqkkkkkk/PredictiQ}{Github}.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' capability in predictive analysis

Lack of systematic evaluations in existing studies

Introducing PredictiQ benchmark for diverse predictive queries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces PredictiQ benchmark for predictive analysis

Evaluates 12 LLMs on 1130 complex queries

Assesses text analysis and code generation alignment

🔎 Similar Papers

Macroeconomic Forecasting with Large Language Models