Teaching the Teacher: Improving Neural Network Distillability for Symbolic Regression via Jacobian Regularization

📅 2025-07-30

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Neural networks exhibit poor distillability in symbolic regression, leading to low robustness and fidelity in symbolic formula extraction. Method: We propose a Jacobian regularization-based teacher network optimization method that explicitly constrains the local smoothness of the teacher’s output during training, thereby guiding it to learn function structures more amenable to symbolic reconstruction and improving the quality of knowledge distillation into symbolic models. Contribution/Results: Integrated with standard distillation frameworks, our approach achieves an average relative improvement of 120% in R² score for the final symbolic models across multiple real-world regression tasks, without degrading teacher prediction accuracy. To our knowledge, this is the first work to introduce Jacobian regularization into symbolic distillation, enhancing neural network symbolizability at the functional prior level and significantly improving both accuracy and stability of symbolic recovery.

Technology Category

Application Category

📝 Abstract

Distilling large neural networks into simple, human-readable symbolic formulas is a promising path toward trustworthy and interpretable AI. However, this process is often brittle, as the complex functions learned by standard networks are poor targets for symbolic discovery, resulting in low-fidelity student models. In this work, we propose a novel training paradigm to address this challenge. Instead of passively distilling a pre-trained network, we introduce a extbf{Jacobian-based regularizer} that actively encourages the ``teacher'' network to learn functions that are not only accurate but also inherently smoother and more amenable to distillation. We demonstrate through extensive experiments on a suite of real-world regression benchmarks that our method is highly effective. By optimizing the regularization strength for each problem, we improve the $R^2$ score of the final distilled symbolic model by an average of extbf{120% (relative)} compared to the standard distillation pipeline, all while maintaining the teacher's predictive accuracy. Our work presents a practical and principled method for significantly improving the fidelity of interpretable models extracted from complex neural networks.

Problem

Research questions and friction points this paper is trying to address.

Improving distillation of neural networks into symbolic formulas

Enhancing smoothness and distillability of teacher networks

Boosting fidelity of interpretable models from complex networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Jacobian-based regularizer for smoother functions

Active teacher network optimization for distillation

Improved symbolic model fidelity via regularization

🔎 Similar Papers

No similar papers found.