Transformer Semantic Genetic Programming for d-dimensional Symbolic Regression Problems

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

To address the exponentially large search space and semantic blindness of traditional genetic programming (GP) in high-dimensional symbolic regression, this paper proposes Semantic-aware Transformer-driven Genetic Programming (SGP-T). The method introduces a pretrained Transformer as a differentiable semantic mutation operator, leveraging a continuous semantic distance metric to dynamically regulate semantic similarity between parent and offspring programs—thereby balancing exploration and exploitation. By integrating large-scale program pretraining with explicit semantic distance control, SGP-T achieves cross-dimensional generalization. Evaluated on 24 real-world and synthetic benchmarks, SGP-T achieves a mean rank of 1.58, significantly outperforming state-of-the-art methods. It sets new best results in both predictive accuracy and expression simplicity, demonstrating superior robustness and generalizability in high-dimensional symbolic regression.

Technology Category

Application Category

📝 Abstract

Transformer Semantic Genetic Programming (TSGP) is a semantic search approach that uses a pre-trained transformer model as a variation operator to generate offspring programs with controlled semantic similarity to a given parent. Unlike other semantic GP approaches that rely on fixed syntactic transformations, TSGP aims to learn diverse structural variations that lead to solutions with similar semantics. We find that a single transformer model trained on millions of programs is able to generalize across symbolic regression problems of varying dimension. Evaluated on 24 real-world and synthetic datasets, TSGP significantly outperforms standard GP, SLIM_GSGP, Deep Symbolic Regression, and Denoising Autoencoder GP, achieving an average rank of 1.58 across all benchmarks. Moreover, TSGP produces more compact solutions than SLIM_GSGP, despite its higher accuracy. In addition, the target semantic distance $mathrm{SD}_t$ is able to control the step size in the semantic space: small values of $mathrm{SD}_t$ enable consistent improvement in fitness but often lead to larger programs, while larger values promote faster convergence and compactness. Thus, $mathrm{SD}_t$ provides an effective mechanism for balancing exploration and exploitation.

Problem

Research questions and friction points this paper is trying to address.

Solving symbolic regression problems using transformer-based semantic search

Generating structurally diverse programs with controlled semantic similarity

Balancing exploration and exploitation through target semantic distance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer model generates semantically similar offspring programs

Learns diverse structural variations for symbolic regression problems

Target semantic distance balances exploration and exploitation

🔎 Similar Papers

ParFam -- (Neural Guided) Symbolic Regression Based on Continuous Global Optimization