π€ AI Summary
Neural networks exhibit poor distillability in symbolic regression, leading to low robustness and fidelity in symbolic formula extraction. Method: We propose a Jacobian regularization-based teacher network optimization method that explicitly constrains the local smoothness of the teacherβs output during training, thereby guiding it to learn function structures more amenable to symbolic reconstruction and improving the quality of knowledge distillation into symbolic models. Contribution/Results: Integrated with standard distillation frameworks, our approach achieves an average relative improvement of 120% in RΒ² score for the final symbolic models across multiple real-world regression tasks, without degrading teacher prediction accuracy. To our knowledge, this is the first work to introduce Jacobian regularization into symbolic distillation, enhancing neural network symbolizability at the functional prior level and significantly improving both accuracy and stability of symbolic recovery.
π Abstract
Distilling large neural networks into simple, human-readable symbolic formulas is a promising path toward trustworthy and interpretable AI. However, this process is often brittle, as the complex functions learned by standard networks are poor targets for symbolic discovery, resulting in low-fidelity student models. In this work, we propose a novel training paradigm to address this challenge. Instead of passively distilling a pre-trained network, we introduce a extbf{Jacobian-based regularizer} that actively encourages the ``teacher'' network to learn functions that are not only accurate but also inherently smoother and more amenable to distillation. We demonstrate through extensive experiments on a suite of real-world regression benchmarks that our method is highly effective. By optimizing the regularization strength for each problem, we improve the $R^2$ score of the final distilled symbolic model by an average of extbf{120% (relative)} compared to the standard distillation pipeline, all while maintaining the teacher's predictive accuracy. Our work presents a practical and principled method for significantly improving the fidelity of interpretable models extracted from complex neural networks.