When Good Equations Get Bad Scores: Improving Symbolic Regression Through Better Parameter Optimization

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

225K/year
🤖 AI Summary
This work addresses the common issue in symbolic regression where correct equation structures receive low scores due to inadequate parameter optimization—a phenomenon often termed the “good structure, bad score” problem. To mitigate the challenges posed by non-convex optimization and its tendency to trap solutions in poor local optima, the authors propose SAGE-Fit, a novel framework that, for the first time, integrates structural and semantic priors of symbolic expressions directly into the inner-loop parameter optimization process. By employing a structure-aware and semantics-guided fitting strategy, SAGE-Fit enhances the fidelity of structure evaluation. Designed as a plug-and-play module, it can be seamlessly incorporated into existing symbolic regression systems, significantly improving both structural assessment accuracy and overall performance, as demonstrated across multiple benchmark datasets.
📝 Abstract
Symbolic Regression (SR) plays a central role in scientific knowledge discovery by distilling mathematical equations from observational data. Most existing SR methods function within a bi-level optimization framework: an outer loop that searches for the discrete equation structure, and an inner loop that optimizes the continuous parameters of that structure. Crucially, parameter-fitting quality directly determines a structure's score and thus the outer-loop search. However, nonlinear operators make the inner loop highly non-convex, and budget-driven reliance on fast local solvers (e.g., BFGS) often yields poor local minima and underestimated scores for correct structures. This ``Good Structure, Bad Score'' phenomenon becomes a key bottleneck, degrading efficiency and misguiding the search away from the true equation. To resolve this, we propose SAGE-Fit (Structure-Aware and Semantics-Guided Evaluator for Symbolic Regression), an SR-native fitting framework that exploits the dual native priors of symbolic expressions. By capitalizing on the structural and semantic priors unique to SR, we design tailored modules for each property, thereby effectively mitigating this optimization bottleneck. Extensive experiments demonstrate that our approach, as a plug-and-play module, significantly enhances evaluation fidelity and universally improves the performance of various SR systems.
Problem

Research questions and friction points this paper is trying to address.

Symbolic Regression
Parameter Optimization
Non-convex Optimization
Local Minima
Equation Discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Symbolic Regression
Parameter Optimization
Structure-Aware
Semantics-Guided
Non-convex Optimization
B
Boxiao Wang
Institute of Automation, Chinese Academy of Sciences
Kai Li
Kai Li
University of Chinese Academy of Sciences & City University of Hong Kong
Computer VisionMultimodal Language ModelRemote Sensing
Z
Zhiwei Chen
Institute of Automation, Chinese Academy of Sciences
Y
Yang Huang
Institute of Automation, Chinese Academy of Sciences
R
Runxiang Wang
Institute of Automation, Chinese Academy of Sciences
Z
Ziwen Zhang
Institute of Automation, Chinese Academy of Sciences
Y
Yifan Zhang
Institute of Automation, Chinese Academy of Sciences
J
Jian Cheng
Institute of Automation, Chinese Academy of Sciences