Symbolic Regression via Latent Iterative Refinement

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the significant amortization gap in symbolic regression, where single-step inference struggles to balance expression accuracy and simplicity. The authors propose the Latent Equation Embedding (LEE) framework, which uniquely integrates iterative amortized inference with differentiable function evaluation. By constructing a shared latent space anchored to functional behavior, LEE jointly embeds symbolic expressions and observational data, enabling a hybrid discrete–continuous optimization through alternating discrete recoding and continuous gradient descent. Notably, the encoder itself acts as a learned optimizer, iteratively refining expressions in the latent space. Experiments on SRBench demonstrate that LEE generates highly accurate expressions with remarkably low complexity—only 8–11—outperforming the strongest baselines by 2–10× in simplicity while maintaining robustness under noisy conditions.

📝 Abstract

Symbolic regression (SR) seeks closed-form mathematical expressions that fit observed data. Neural SR methods amortize the search by training an encoder to map observations directly to expressions in a single pass, but this amortized inference leaves a residual amortization gap between its one-shot prediction and the true posterior. We propose Latent Equation Embedding (LEE), a framework that closes this gap through iterative amortized inference in a functionally grounded latent space. LEE learns a shared latent space Z equipped with three components: an encoder f_theta that jointly embeds symbolic tokens and numerical observations into a single latent vector z; an expression decoder g_expr that reconstructs formulas from z; and an evaluation decoder g_eval that predicts function values from z, explicitly grounding the latent space in functional behavior. At inference, LEE performs iterative refinement by re-encoding decoded expressions jointly with observations, progressively improving the latent estimate. LEE uses the encoder itself as a learned inference optimizer: each re-encoding step implicitly computes the mismatch between the candidate and the data. Because g_eval is differentiable in z, we additionally interleave continuous gradient descent with discrete re-encoding, yielding a hybrid iterative and gradient refinement procedure. On SRBench across three noise levels, against 19 baselines spanning genetic programming, symbolic-neural hybrids, and pre-trained Transformers, LEE produces expressions 2--10x simpler than the strongest accuracy-oriented baselines, including Operon, GP-GOMEA, TPSR, RAG-SR, and GenSR, with complexity 8--11 versus 20--90. These results advance the low-complexity region of the accuracy-complexity Pareto frontier and show graceful degradation as noise increases.

Problem

Research questions and friction points this paper is trying to address.

Symbolic Regression

Amortization Gap

Latent Space

Expression Complexity

Iterative Inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Symbolic Regression

Latent Iterative Refinement

Amortized Inference