🤖 AI Summary
Selecting mathematical expressions in symbolic regression that simultaneously balance accuracy, simplicity, and generalization remains a significant challenge. This study presents the first systematic evaluation of model selection criteria—including AIC, AICc, BIC, MDL, and Efron’s bootstrap—within a unified experimental framework. Leveraging genetic programming to generate Pareto-optimal candidate expressions, the authors assess each criterion’s ability to recover the true underlying function and its generalization performance on synthetic datasets with Gaussian noise, using perturbed variants of the ground-truth function. Results demonstrate that MDL consistently achieves the lowest test error and shortest expression length across most scenarios, while both MDL and BIC exhibit superior performance in identifying the true model, highlighting MDL’s overall advantage in symbolic regression tasks.
📝 Abstract
Effective model selection is critical in symbolic regression (SR) to identify mathematical expressions that balance accuracy and complexity, and have low expected error on unseen data. Many modern implementations of genetic programming (GP) for SR generate a set of Pareto optimal candidate solutions, but reliable automatic selection of solutions that generalize well remains an open issue. Current literature offers various information-theoretic and Bayesian approaches, yet comprehensive comparisons of their performance across different data regimes are limited. This study presents a systematic empirical comparison of widely used selection criteria: the Akaike information criterion (AIC), the corrected AIC (AICc), the Bayesian information criterion (BIC), minimum description length (MDL), as well as Efron's bootstrap estimate for the in-sample prediction error on seven synthetic datasets with Gaussian noise. We rank candidate expressions generated by perturbing ground-truth functions to assess generalization error and selection probability of the ground-truth expression. Our findings reveal that MDL consistently identifies models with the lowest test error and the shortest length across most datasets. While no single criterion dominates all results, MDL and BIC produced the highest probability of selecting the ground-truth expressions.