🤖 AI Summary
This work investigates whether large language models (LLMs) can perform multi-step spatiotemporal extrapolation of partial differential equation (PDE) dynamics in a true zero-shot setting—without fine-tuning, natural-language prompting, or explicit physical priors—using only sequences of discretized, text-encoded PDE solutions as training data.
Method: We propose a text-based discrete-state sequence modeling framework for PDEs, integrating in-context learning with token-level distribution modeling to realize recursive zero-shot prediction.
Contribution/Results: We demonstrate, for the first time, that LLMs implicitly learn PDE dynamical structure. We introduce the “contextual neural scaling law” to characterize how prediction accuracy scales with spatial grid resolution. Extensive evaluation across canonical PDE systems—including reaction-diffusion, Burgers’, and wave equations—shows high-fidelity zero-shot extrapolation, with algebraic error growth mirroring the global error accumulation behavior of numerical solvers. This establishes a novel paradigm for physics-informed modeling using LLMs.
📝 Abstract
Large language models (LLMs) have demonstrated emergent in-context learning (ICL) capabilities across a range of tasks, including zero-shot time-series forecasting. We show that text-trained foundation models can accurately extrapolate spatiotemporal dynamics from discretized partial differential equation (PDE) solutions without fine-tuning or natural language prompting. Predictive accuracy improves with longer temporal contexts but degrades at finer spatial discretizations. In multi-step rollouts, where the model recursively predicts future spatial states over multiple time steps, errors grow algebraically with the time horizon, reminiscent of global error accumulation in classical finite-difference solvers. We interpret these trends as in-context neural scaling laws, where prediction quality varies predictably with both context length and output length. To better understand how LLMs are able to internally process PDE solutions so as to accurately roll them out, we analyze token-level output distributions and uncover a consistent ICL progression: beginning with syntactic pattern imitation, transitioning through an exploratory high-entropy phase, and culminating in confident, numerically grounded predictions.