Discovering physical laws with parallel combinatorial tree search

📅 2024-07-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Symbolic regression faces fundamental challenges in balancing formula simplicity, generalization capability, and search efficiency within an infinite expression space, severely limiting its applicability to scientific discovery. To address this, we propose the Parallel Compositional Tree Search (PCTS) framework, which jointly integrates syntax-tree structural constraints, distributed enumeration, semantics-aware priority scheduling, and a differentiable symbolic evaluator, complemented by a novel parallelized pruning mechanism. Evaluated on over 200 synthetic and real-world benchmark datasets, PCTS achieves up to 99% higher average accuracy and an order-of-magnitude speedup in inference time compared to state-of-the-art methods. It establishes a new paradigm for efficiently discovering interpretable physical laws from limited observational data, significantly advancing the practical utility of symbolic regression in scientific modeling.

Technology Category

Application Category

📝 Abstract
Symbolic regression plays a crucial role in modern scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data. A grand challenge lies in the arduous search for parsimonious and generalizable mathematical formulas, in an infinite search space, while intending to fit the training data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity, which essentially hinders the pace of applying symbolic regression for scientific exploration across interdisciplinary domains. To this end, we introduce a parallel combinatorial tree search (PCTS) model to efficiently distill generic mathematical expressions from limited data. Through a series of extensive experiments, we demonstrate the superior accuracy and efficiency of PCTS for equation discovery, which greatly outperforms the state-of-the-art baseline models on over 200 synthetic and experimental datasets (e.g., lifting its performance by up to 99% accuracy improvement and one-order of magnitude speed up). PCTS represents a key advance in accurate and efficient data-driven discovery of symbolic, interpretable models (e.g., underlying physical laws) and marks a pivotal transition towards scalable symbolic learning.
Problem

Research questions and friction points this paper is trying to address.

Efficiently discover parsimonious mathematical formulas from data.
Overcome accuracy and efficiency bottlenecks in symbolic regression.
Enable scalable and interpretable model discovery for scientific exploration.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel combinatorial tree search for symbolic regression
Efficient distillation of mathematical expressions from data
Superior accuracy and efficiency in equation discovery
Kai Ruan
Kai Ruan
Gaoling School of Artificial Intelligence, Renmin University of China
AI for ScienceSymbolic regression
Z
Ze-Feng Gao
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
Y
Yike Guo
Department of Computer Science and Engineering, HKUST, Hong Kong, China
H
Hao Sun
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China; Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China
Ji-Rong Wen
Ji-Rong Wen
Gaoling School of Artificial Intelligence, Renmin University of China
Large Language ModelWeb SearchInformation RetrievalMachine Learning
Y
Yang Liu
School of Engineering Science, University of Chinese Academy of Sciences, Beijing, China; State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing, China