🤖 AI Summary
To address the exponentially large search space and semantic blindness of traditional genetic programming (GP) in high-dimensional symbolic regression, this paper proposes Semantic-aware Transformer-driven Genetic Programming (SGP-T). The method introduces a pretrained Transformer as a differentiable semantic mutation operator, leveraging a continuous semantic distance metric to dynamically regulate semantic similarity between parent and offspring programs—thereby balancing exploration and exploitation. By integrating large-scale program pretraining with explicit semantic distance control, SGP-T achieves cross-dimensional generalization. Evaluated on 24 real-world and synthetic benchmarks, SGP-T achieves a mean rank of 1.58, significantly outperforming state-of-the-art methods. It sets new best results in both predictive accuracy and expression simplicity, demonstrating superior robustness and generalizability in high-dimensional symbolic regression.
📝 Abstract
Transformer Semantic Genetic Programming (TSGP) is a semantic search approach that uses a pre-trained transformer model as a variation operator to generate offspring programs with controlled semantic similarity to a given parent. Unlike other semantic GP approaches that rely on fixed syntactic transformations, TSGP aims to learn diverse structural variations that lead to solutions with similar semantics. We find that a single transformer model trained on millions of programs is able to generalize across symbolic regression problems of varying dimension. Evaluated on 24 real-world and synthetic datasets, TSGP significantly outperforms standard GP, SLIM_GSGP, Deep Symbolic Regression, and Denoising Autoencoder GP, achieving an average rank of 1.58 across all benchmarks. Moreover, TSGP produces more compact solutions than SLIM_GSGP, despite its higher accuracy. In addition, the target semantic distance $mathrm{SD}_t$ is able to control the step size in the semantic space: small values of $mathrm{SD}_t$ enable consistent improvement in fitness but often lead to larger programs, while larger values promote faster convergence and compactness. Thus, $mathrm{SD}_t$ provides an effective mechanism for balancing exploration and exploitation.