🤖 AI Summary
To address negative transfer and poor generalization of foundation models in few-shot symbolic regression, this paper proposes EQUATE: a framework that reformulates discrete equation search as a continuous optimization problem within a shared embedding space, enabling lightweight adaptation of foundation models via knowledge distillation. It introduces a symbolic-numerical alignment mechanism to ensure semantic consistency and designs an evaluator-guided embedding optimization with parsimony regularization to jointly optimize equation accuracy and simplicity. Evaluated on the Feynman, Strogatz, and black-box datasets, EQUATE significantly outperforms state-of-the-art methods across four key dimensions—accuracy, robustness, model simplicity, and inference speed—achieving synergistic improvements. Notably, it is the first method to realize high-quality, low-overhead, and interpretable end-to-end equation discovery.
📝 Abstract
Discovering interpretable mathematical equations from observed data (a.k.a. equation discovery or symbolic regression) is a cornerstone of scientific discovery, enabling transparent modeling of physical, biological, and economic systems. While foundation models pre-trained on large-scale equation datasets offer a promising starting point, they often suffer from negative transfer and poor generalization when applied to small, domain-specific datasets. In this paper, we introduce EQUATE (Equation Generation via QUality-Aligned Transfer Embeddings), a data-efficient fine-tuning framework that adapts foundation models for symbolic equation discovery in low-data regimes via distillation. EQUATE combines symbolic-numeric alignment with evaluator-guided embedding optimization, enabling a principled embedding-search-generation paradigm. Our approach reformulates discrete equation search as a continuous optimization task in a shared embedding space, guided by data-equation fitness and simplicity. Experiments across three standard public benchmarks (Feynman, Strogatz, and black-box datasets) demonstrate that EQUATE consistently outperforms state-of-the-art baselines in both accuracy and robustness, while preserving low complexity and fast inference. These results highlight EQUATE as a practical and generalizable solution for data-efficient symbolic regression in foundation model distillation settings.