Dimension Reduction for Symbolic Regression

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of high formula complexity and poor recovery of true solutions in high-dimensional symbolic regression, this paper proposes an iterative variable-combination dimensionality reduction method grounded in functional dependency analysis. The method automatically identifies fixed algebraic combinations—such as sums, products, or ratios—among input variables and replaces them with new synthetic variables, thereby substantially reducing the search-space dimensionality while preserving semantic equivalence. Its core innovation lies in rigorously formalizing the validity of variable substitutions as a functional dependency verification problem and tightly coupling this verification with symbolic regression solvers to achieve end-to-end automated dimensionality reduction. Experiments demonstrate that the method robustly identifies effective variable combinations and consistently improves formula recovery rates and convergence speed for leading symbolic regression algorithms—including SR-Tree, AI-Feynman, and DeepSymbolic—across multiple benchmark datasets. On average, recovered expressions exhibit over 35% lower complexity.

Technology Category

Application Category

📝 Abstract
Solutions of symbolic regression problems are expressions that are composed of input variables and operators from a finite set of function symbols. One measure for evaluating symbolic regression algorithms is their ability to recover formulae, up to symbolic equivalence, from finite samples. Not unexpectedly, the recovery problem becomes harder when the formula gets more complex, that is, when the number of variables and operators gets larger. Variables in naturally occurring symbolic formulas often appear only in fixed combinations. This can be exploited in symbolic regression by substituting one new variable for the combination, effectively reducing the number of variables. However, finding valid substitutions is challenging. Here, we address this challenge by searching over the expression space of small substitutions and testing for validity. The validity test is reduced to a test of functional dependence. The resulting iterative dimension reduction procedure can be used with any symbolic regression approach. We show that it reliably identifies valid substitutions and significantly boosts the performance of different types of state-of-the-art symbolic regression algorithms.
Problem

Research questions and friction points this paper is trying to address.

Reducing variables in symbolic regression via substitutions
Identifying valid variable combinations for dimension reduction
Improving symbolic regression algorithm performance through iteration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Substitutes variables with combined new variables
Tests validity via functional dependence checks
Iteratively reduces dimensions in symbolic regression
🔎 Similar Papers
No similar papers found.
Paul Kahlmeyer
Paul Kahlmeyer
PhD student, FSU Jena
Machine learning
M
Markus Fischer
Friedrich Schiller University Jena
J
Joachim Giesen
Friedrich Schiller University Jena