Equality Graph Assisted Symbolic Regression

๐Ÿ“… 2025-11-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In symbolic regression, genetic programming (GP) suffers from severe search inefficiency due to neutralityโ€”up to 60% of expression evaluations are redundant. To address this, we propose SymRegg, the first symbolic regression algorithm systematically integrating equality graphs (e-graphs): it compactly represents and incrementally maintains sets of equivalent expressions to eliminate redundant evaluation, and employs lightweight perturbation and selection strategies for efficient exploration of the expression space. SymRegg achieves GP-level modeling accuracy while reducing expression redundancy by up to 60%, significantly improving search efficiency. It exhibits strong expression preservation, minimal hyperparameters (only population size and number of iterations), and robust generalization across diverse datasets. Extensive experiments on multiple benchmarks demonstrate that SymRegg simultaneously attains high predictive accuracy and low computational overhead.

Technology Category

Application Category

๐Ÿ“ Abstract
In Symbolic Regression (SR), Genetic Programming (GP) is a popular search algorithm that delivers state-of-the-art results in term of accuracy. Its success relies on the concept of neutrality, which induces large plateaus that the search can safely navigate to more promising regions. Navigating these plateaus, while necessary, requires the computation of redundant expressions, up to 60% of the total number of evaluation, as noted in a recent study. The equality graph (e-graph) structure can compactly store and group equivalent expressions enabling us to verify if a given expression and their variations were already visited by the search, thus enabling us to avoid unnecessary computation. We propose a new search algorithm for symbolic regression called SymRegg that revolves around the e-graph structure following simple steps: perturb solutions sampled from a selection of expressions stored in the e-graph, if it generates an unvisited expression, insert it into the e-graph and generates its equivalent forms. We show that SymRegg is capable of improving the efficiency of the search, maintaining consistently accurate results across different datasets while requiring a choice of a minimalist set of hyperparameters.
Problem

Research questions and friction points this paper is trying to address.

Reducing redundant expression evaluations in symbolic regression
Using equality graphs to compactly store equivalent expressions
Improving search efficiency while maintaining accuracy across datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Equality graph compactly stores equivalent expressions
SymRegg algorithm perturbs solutions from e-graph
Avoids redundant computations by tracking visited expressions
๐Ÿ”Ž Similar Papers
No similar papers found.
F
Fabricio Olivetti de Franca
Federal University of ABC, Santo Andre, SP , Brazil
Gabriel Kronberger
Gabriel Kronberger
University of Applied Sciences Upper Austria
Symbolic RegressionEquation DiscoveryMachine Learning