Can Large Language Models Adequately Perform Symbolic Reasoning Over Time Series?

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Can large language models (LLMs) perform interpretable, context-aligned symbolic reasoning over time series? This work systematically evaluates LLMs on three core scientific tasks: multivariate symbolic regression, Boolean network inference, and causal discovery. To this end, we introduce SymbolBench—the first benchmark explicitly designed for symbolic reasoning—and propose a closed-loop LLM–genetic programming (GP) framework. Our approach integrates domain-knowledge guidance, context-alignment mechanisms, and structured reasoning strategies to enable efficient search and verification within the symbolic space. Experimental results demonstrate substantial improvements in both accuracy and interpretability of discovered symbolic laws. Moreover, the study identifies critical limitations of current LLMs in symbolic reasoning—particularly in compositional generalization, constraint satisfaction, and formal validation—and delineates concrete optimization pathways. By unifying neural and symbolic paradigms, this work establishes a novel paradigm for AI-driven scientific discovery.

Technology Category

Application Category

📝 Abstract

Uncovering hidden symbolic laws from time series data, as an aspiration dating back to Kepler's discovery of planetary motion, remains a core challenge in scientific discovery and artificial intelligence. While Large Language Models show promise in structured reasoning tasks, their ability to infer interpretable, context-aligned symbolic structures from time series data is still underexplored. To systematically evaluate this capability, we introduce SymbolBench, a comprehensive benchmark designed to assess symbolic reasoning over real-world time series across three tasks: multivariate symbolic regression, Boolean network inference, and causal discovery. Unlike prior efforts limited to simple algebraic equations, SymbolBench spans a diverse set of symbolic forms with varying complexity. We further propose a unified framework that integrates LLMs with genetic programming to form a closed-loop symbolic reasoning system, where LLMs act both as predictors and evaluators. Our empirical results reveal key strengths and limitations of current models, highlighting the importance of combining domain knowledge, context alignment, and reasoning structure to improve LLMs in automated scientific discovery.

Problem

Research questions and friction points this paper is trying to address.

Evaluate LLMs' ability to infer symbolic structures from time series

Assess symbolic reasoning across diverse real-world tasks

Combine LLMs with genetic programming for scientific discovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces SymbolBench for symbolic reasoning evaluation

Combines LLMs with genetic programming framework

Assesses multivariate symbolic regression and discovery

🔎 Similar Papers

No similar papers found.