🤖 AI Summary
Symbolic regression faces fundamental challenges in balancing formula simplicity, generalization capability, and search efficiency within an infinite expression space, severely limiting its applicability to scientific discovery. To address this, we propose the Parallel Compositional Tree Search (PCTS) framework, which jointly integrates syntax-tree structural constraints, distributed enumeration, semantics-aware priority scheduling, and a differentiable symbolic evaluator, complemented by a novel parallelized pruning mechanism. Evaluated on over 200 synthetic and real-world benchmark datasets, PCTS achieves up to 99% higher average accuracy and an order-of-magnitude speedup in inference time compared to state-of-the-art methods. It establishes a new paradigm for efficiently discovering interpretable physical laws from limited observational data, significantly advancing the practical utility of symbolic regression in scientific modeling.
📝 Abstract
Symbolic regression plays a crucial role in modern scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data. A grand challenge lies in the arduous search for parsimonious and generalizable mathematical formulas, in an infinite search space, while intending to fit the training data. Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity, which essentially hinders the pace of applying symbolic regression for scientific exploration across interdisciplinary domains. To this end, we introduce a parallel combinatorial tree search (PCTS) model to efficiently distill generic mathematical expressions from limited data. Through a series of extensive experiments, we demonstrate the superior accuracy and efficiency of PCTS for equation discovery, which greatly outperforms the state-of-the-art baseline models on over 200 synthetic and experimental datasets (e.g., lifting its performance by up to 99% accuracy improvement and one-order of magnitude speed up). PCTS represents a key advance in accurate and efficient data-driven discovery of symbolic, interpretable models (e.g., underlying physical laws) and marks a pivotal transition towards scalable symbolic learning.