🤖 AI Summary
To address the challenges of high formula complexity and poor recovery of true solutions in high-dimensional symbolic regression, this paper proposes an iterative variable-combination dimensionality reduction method grounded in functional dependency analysis. The method automatically identifies fixed algebraic combinations—such as sums, products, or ratios—among input variables and replaces them with new synthetic variables, thereby substantially reducing the search-space dimensionality while preserving semantic equivalence. Its core innovation lies in rigorously formalizing the validity of variable substitutions as a functional dependency verification problem and tightly coupling this verification with symbolic regression solvers to achieve end-to-end automated dimensionality reduction. Experiments demonstrate that the method robustly identifies effective variable combinations and consistently improves formula recovery rates and convergence speed for leading symbolic regression algorithms—including SR-Tree, AI-Feynman, and DeepSymbolic—across multiple benchmark datasets. On average, recovered expressions exhibit over 35% lower complexity.
📝 Abstract
Solutions of symbolic regression problems are expressions that are composed of input variables and operators from a finite set of function symbols. One measure for evaluating symbolic regression algorithms is their ability to recover formulae, up to symbolic equivalence, from finite samples. Not unexpectedly, the recovery problem becomes harder when the formula gets more complex, that is, when the number of variables and operators gets larger. Variables in naturally occurring symbolic formulas often appear only in fixed combinations. This can be exploited in symbolic regression by substituting one new variable for the combination, effectively reducing the number of variables. However, finding valid substitutions is challenging. Here, we address this challenge by searching over the expression space of small substitutions and testing for validity. The validity test is reduced to a test of functional dependence. The resulting iterative dimension reduction procedure can be used with any symbolic regression approach. We show that it reliably identifies valid substitutions and significantly boosts the performance of different types of state-of-the-art symbolic regression algorithms.