Diversified Residual Symbolic Regression

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This work addresses the challenge that symbolic regression often fails to uncover multiple underlying relationships in real-world data due to outliers or mixed generative processes, a limitation exacerbated by conventional approaches that overlook the diversity of residual patterns. To overcome this, the study introduces, for the first time, residual distribution diversity into symbolic regression and proposes a quality-diversity optimization framework. This framework simultaneously searches for multiple analytically interpretable expressions that are not only accurate in prediction but also exhibit distinct residual structures. Evaluated on synthetic datasets with mixed underlying laws, the method successfully recovers the constituent relationships. Moreover, when applied to real astronomical data, it discovers several expressions consistent with established physical principles, significantly outperforming existing symbolic regression techniques and enhancing both model interpretability and practical utility.
📝 Abstract
Symbolic regression (SR) aims to discover explicit mathematical expressions that explain observed data and is widely used in domains where interpretability is essential. Because interpretability requires expressions to reflect meaningful regularities, SR is sensitive to observations that deviate from the dominant relationship. Such irregular observations, or outliers, are common in real-world data and can hinder SR from identifying underlying regularities. Robust regression mitigates this by downweighting observations with large residuals. However, deciding which observations should be treated as outliers is often ambiguous and depends on user interpretation and domain knowledge, a perspective largely overlooked in existing SR studies. This motivates approaches that present multiple candidate expressions, allowing users to examine different residual patterns and choose expressions consistent with their expertise. We propose diversified residual symbolic regression (DRSR), which achieves high predictive accuracy while promoting diversity with respect to residual patterns based on the Quality-Diversity paradigm. DRSR collects multiple expressions that fit the data well but differ in how residuals are distributed, enabling post-search selection aligned with domain knowledge. On a synthetic mixture dataset, DRSR produces more diverse expressions than conventional SR while capturing multiple underlying relationships. On a real-world astronomical dataset, DRSR discovers multiple expressions consistent with known physical relationships.
Problem

Research questions and friction points this paper is trying to address.

Symbolic Regression
Outliers
Residual Patterns
Interpretability
Robust Regression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Symbolic Regression
Quality-Diversity
Residual Diversity
Robust Regression
Interpretability