Enhancing Generalization in Evolutionary Feature Construction for Symbolic Regression Through Vicinal Jensen Gap Minimization

📅 2026-02-02
🏛️ IEEE Transactions on Evolutionary Computation
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of genetic programming in symbolic regression due to overfitting. The authors propose a novel evolutionary feature construction method based on neighborhood risk decomposition, which, for the first time, incorporates the neighborhood Jensen gap as a regularization term to jointly optimize empirical risk and the Jensen gap. To enhance robustness, the approach integrates dynamic regularization strength adjustment, manifold intrusion detection, and noise perturbation mechanisms, effectively mitigating the generation of unrealistic samples caused by data augmentation. Extensive experiments on 58 benchmark datasets demonstrate that the proposed method outperforms existing complexity-controlling metrics and significantly improves symbolic regression performance compared to 15 state-of-the-art machine learning algorithms.

Technology Category

Application Category

📝 Abstract
Genetic programming-based feature construction has achieved significant success in recent years as an automated machine learning technique to enhance learning performance. However, overfitting remains a challenge that limits its broader applicability. To improve generalization, we prove that vicinal risk, estimated through noise perturbation or mixup-based data augmentation, is bounded by the sum of empirical risk and a regularization term-either finite difference or the vicinal Jensen gap. Leveraging this decomposition, we propose an evolutionary feature construction framework that jointly optimizes empirical risk and the vicinal Jensen gap to control overfitting. Since datasets may vary in noise levels, we develop a noise estimation strategy to dynamically adjust regularization strength. Furthermore, to mitigate manifold intrusion-where data augmentation may generate unrealistic samples that fall outside the data manifold-we propose a manifold intrusion detection mechanism. Experimental results on 58 datasets demonstrate the effectiveness of Jensen gap minimization compared to other complexity measures. Comparisons with 15 machine learning algorithms further indicate that genetic programming with the proposed overfitting control strategy achieves superior performance.
Problem

Research questions and friction points this paper is trying to address.

overfitting
generalization
symbolic regression
feature construction
genetic programming
Innovation

Methods, ideas, or system contributions that make the work stand out.

vicinal Jensen gap
evolutionary feature construction
symbolic regression
manifold intrusion detection
generalization enhancement
🔎 Similar Papers
No similar papers found.