🤖 AI Summary
Automated discovery of interpretable physical laws remains challenging: conventional SINDy methods rely heavily on expert-crafted candidate function libraries and optimization strategies, resulting in poor generalizability. This paper introduces the first intelligent agent framework integrating large language models (LLMs), vision-language models (VLMs), and retrieval-augmented generation (RAG) to enable physics-informed prior injection, multimodal observational understanding (text, time-series data, and images), and automatic candidate library generation. The framework constructs interpretable dynamical models via L1-regularized sparse regression coupled with iterative reflective refinement. Evaluated on 198 benchmark dynamical systems, it achieves state-of-the-art performance—improving accuracy by 20% over the best baseline—while substantially reducing dependence on domain expertise. This work advances the automation and universality of physical law discovery.
📝 Abstract
Inferring physical laws from data is a central challenge in science and engineering, including but not limited to healthcare, physical sciences, biosciences, social sciences, sustainability, climate, and robotics. Deep networks offer high-accuracy results but lack interpretability, prompting interest in models built from simple components. The Sparse Identification of Nonlinear Dynamics (SINDy) method has become the go-to approach for building such modular and interpretable models. SINDy leverages sparse regression with L1 regularization to identify key terms from a library of candidate functions. However, SINDy's choice of candidate library and optimization method requires significant technical expertise, limiting its widespread applicability. This work introduces Al-Khwarizmi, a novel agentic framework for physical law discovery from data, which integrates foundational models with SINDy. Leveraging LLMs, VLMs, and Retrieval-Augmented Generation (RAG), our approach automates physical law discovery, incorporating prior knowledge and iteratively refining candidate solutions via reflection. Al-Khwarizmi operates in two steps: it summarizes system observations-comprising textual descriptions, raw data, and plots-followed by a secondary step that generates candidate feature libraries and optimizer configurations to identify hidden physics laws correctly. Evaluating our algorithm on over 198 models, we demonstrate state-of-the-art performance compared to alternatives, reaching a 20 percent increase against the best-performing alternative.