🤖 AI Summary
Existing symbolic regression (SR) methods suffer from poor scalability, inconsistent symbolic representations, and redundant expressions. To address these issues, this paper proposes LIES: an end-to-end differentiable, fixed-architecture neural network embedding four interpretable primitive functions—Log, Identity, Exp, and Sin—to enforce symbolic prior consistency. LIES integrates oversampling-based training, a sparsity-stabilized loss function, and gradient-stabilized optimization, while incorporating heuristic expression extraction and post-training pruning to jointly optimize both expression simplicity and fidelity. On standard SR benchmarks, LIES consistently outperforms state-of-the-art baselines, producing significantly shorter expressions with lower approximation error. Ablation studies confirm the critical contributions of LIES’s architecture, sparse loss formulation, and pruning module to interpretability, generalization, and overall SR performance.
📝 Abstract
Symbolic regression (SR) aims to discover closed-form mathematical expressions that accurately describe data, offering interpretability and analytical insight beyond standard black-box models. Existing SR methods often rely on population-based search or autoregressive modeling, which struggle with scalability and symbolic consistency. We introduce LIES (Logarithm, Identity, Exponential, Sine), a fixed neural network architecture with interpretable primitive activations that are optimized to model symbolic expressions. We develop a framework to extract compact formulae from LIES networks by training with an appropriate oversampling strategy and a tailored loss function to promote sparsity and to prevent gradient instability. After training, it applies additional pruning strategies to further simplify the learned expressions into compact formulae. Our experiments on SR benchmarks show that the LIES framework consistently produces sparse and accurate symbolic formulae outperforming all baselines. We also demonstrate the importance of each design component through ablation studies.