FePySR: A Neural Feature Extraction Framework for Efficient and Scalable Symbolic Regression

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

222K/year
🤖 AI Summary
This work addresses the challenge of vast search spaces and high computational complexity in symbolic regression by proposing a two-stage framework. First, a heterogeneous neural network extracts candidate feature expressions from observational data; then, structural optimization is performed within this compressed expression space using PySR. This approach uniquely decouples neural feature extraction from symbolic regression, substantially reducing the search space and enhancing both recovery accuracy and robustness—particularly for complex equations under high noise or involving differential dynamics. Experimental results demonstrate superior performance over existing methods across five standard benchmarks: the method successfully recovers 36 out of 75 synthetic equations, achieves lower errors and faster runtimes in the remaining cases, and identifies 24% of the governing equations in a biological system ODE task where PySR fails entirely.
📝 Abstract
A fundamental challenge in symbolic regression (SR) is efficiently recovering complex mathematical expressions from observational data. Although this problem is NP-hard, many expressions of practical interest decompose naturally into combinations of nonlinear feature modules, concentrating structural complexity into a small number of reusable components. Here, we introduce FePySR, a two-stage framework that reduces the SR search space by extracting valid features prior to equation search. FePySR first employs a heterogeneous neural network to constrain observational data to a set of candidate expressions, then performs structural optimization within this refined expression space using PySR. Across five standard benchmarks, FePySR outperforms state-of-the-art methods by achieving higher equation recovery rates. On a set of 75 highly complex synthesized equations, FePySR recovers 36 equations, while producing substantially smaller mean squared errors on the remaining unrecovered cases, with reduced computation time compared to PySR. FePySR's first stage also maintains consistent performance under varying numbers of selected top features and increasing levels of noise in the observational data. Applied to ordinary differential equations governing biological systems, FePySR successfully identifies governing equations in 24 out of 100 tests where PySR recovers none. Taken together, FePySR is a generalizable framework that can enhance the SR solvers, enabling the efficient and reliable recovery of symbolic expressions across scientific domains.
Problem

Research questions and friction points this paper is trying to address.

symbolic regression
mathematical expression recovery
NP-hard problem
feature extraction
equation discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Symbolic Regression
Neural Feature Extraction
Two-stage Framework
Equation Discovery
Scalable SR
🔎 Similar Papers
No similar papers found.