🤖 AI Summary
Gene Expression Programming (GEP) for symbolic regression suffers from reliance on random initialization, slow convergence, and high computational cost. Method: This paper introduces the first symbolic transfer learning paradigm tailored for evolutionary algorithms. It serializes historically optimized mathematical expressions into structured textual sequences and leverages Transformer-based language models to extract cross-task transferable operator combinations and subexpression patterns. Integrating semantic similarity metrics, it enables knowledge-driven initial population generation. Contribution/Results: The approach overcomes the fundamental initialization bottleneck of conventional GEP. Evaluated on multiple symbolic regression benchmarks and real-world computational fluid dynamics tasks, it achieves equivalent accuracy an average of 42% earlier in terms of iteration count, while simultaneously improving solution quality and interpretability.
📝 Abstract
Gene expression programming is an evolutionary optimization algorithm with the potential to generate interpretable and easily implementable equations for regression problems. Despite knowledge gained from previous optimizations being potentially available, the initial candidate solutions are typically generated randomly at the beginning and often only include features or terms based on preliminary user assumptions. This random initial guess, which lacks constraints on the search space, typically results in higher computational costs in the search for an optimal solution. Meanwhile, transfer learning, a technique to reuse parts of trained models, has been successfully applied to neural networks. However, no generalized strategy for its use exists for symbolic regression in the context of evolutionary algorithms. In this work, we propose an approach for integrating transfer learning with gene expression programming applied to symbolic regression. The constructed framework integrates Natural Language Processing techniques to discern correlations and recurring patterns from equations explored during previous optimizations. This integration facilitates the transfer of acquired knowledge from similar tasks to new ones. Through empirical evaluation of the extended framework across a range of univariate problems from an open database and from the field of computational fluid dynamics, our results affirm that initial solutions derived via a transfer learning mechanism enhance the algorithm's convergence rate towards improved solutions.