Equality Graph Assisted Symbolic Regression

📅 2025-11-02

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

In symbolic regression, genetic programming (GP) suffers from severe search inefficiency due to neutrality—up to 60% of expression evaluations are redundant. To address this, we propose SymRegg, the first symbolic regression algorithm systematically integrating equality graphs (e-graphs): it compactly represents and incrementally maintains sets of equivalent expressions to eliminate redundant evaluation, and employs lightweight perturbation and selection strategies for efficient exploration of the expression space. SymRegg achieves GP-level modeling accuracy while reducing expression redundancy by up to 60%, significantly improving search efficiency. It exhibits strong expression preservation, minimal hyperparameters (only population size and number of iterations), and robust generalization across diverse datasets. Extensive experiments on multiple benchmarks demonstrate that SymRegg simultaneously attains high predictive accuracy and low computational overhead.

Technology Category

Application Category

📝 Abstract

In Symbolic Regression (SR), Genetic Programming (GP) is a popular search algorithm that delivers state-of-the-art results in term of accuracy. Its success relies on the concept of neutrality, which induces large plateaus that the search can safely navigate to more promising regions. Navigating these plateaus, while necessary, requires the computation of redundant expressions, up to 60% of the total number of evaluation, as noted in a recent study. The equality graph (e-graph) structure can compactly store and group equivalent expressions enabling us to verify if a given expression and their variations were already visited by the search, thus enabling us to avoid unnecessary computation. We propose a new search algorithm for symbolic regression called SymRegg that revolves around the e-graph structure following simple steps: perturb solutions sampled from a selection of expressions stored in the e-graph, if it generates an unvisited expression, insert it into the e-graph and generates its equivalent forms. We show that SymRegg is capable of improving the efficiency of the search, maintaining consistently accurate results across different datasets while requiring a choice of a minimalist set of hyperparameters.

Problem

Research questions and friction points this paper is trying to address.

Reducing redundant expression evaluations in symbolic regression

Using equality graphs to compactly store equivalent expressions

Improving search efficiency while maintaining accuracy across datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Equality graph compactly stores equivalent expressions

SymRegg algorithm perturbs solutions from e-graph

Avoids redundant computations by tracking visited expressions

🔎 Similar Papers

OmniPred: Language Models as Universal Regressors

2024-02-22arXiv.orgCitations: 8

Genentech

New York City, New York, United States of America / South San Francisco, California, United States of America

Thesis Using Geometrical Deep Learning for Simulation

Bosch Group

Stuttgart, Germany

Authors to Follow