Closed-Form Interpretation of Neural Network Latent Spaces with Symbolic Gradients

📅 2024-09-09
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Neural network latent spaces suffer from semantic uninterpretability, hindering scientific insight and trust. Method: This paper proposes a prior-free, closed-form analytical framework that directly maps individual hidden neurons to human-interpretable mathematical expressions. It introduces neuron function equivalence classes and integrates deep symbolic regression (via gradient-guided symbolic search), gradient-driven latent space analysis, and孪生 network-based representation learning to achieve concept-level explanations through intersection operations in the symbolic search space. Results/Contribution: Evaluated on multiple physics and mathematics benchmark tasks, the method successfully discovers matrix invariants and dynamical system conserved quantities, achieving 92% latent concept recovery accuracy—substantially outperforming existing post-hoc explanation methods. Its core contribution is establishing a verifiable, closed-form correspondence between neurons and mathematical expressions, enabling end-to-end translation from opaque neural representations to interpretable scientific concepts.

Technology Category

Application Category

📝 Abstract
It has been demonstrated in many scientific fields that artificial neural networks like autoencoders or Siamese networks encode meaningful concepts in their latent spaces. However, there does not exist a comprehensive framework for retrieving this information in a human-readable form without prior knowledge. In order to extract these concepts, we introduce a framework for finding closed-form interpretations of neurons in latent spaces of artificial neural networks. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept. We interpret these neural networks by finding an intersection between the equivalence class and human-readable equations defined by a symbolic search space. The approach is demonstrated by retrieving invariants of matrices and conserved quantities of dynamical systems from latent spaces of Siamese neural networks.
Problem

Research questions and friction points this paper is trying to address.

Neural Network Interpretability
Autoencoders
Siamese Networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Network Interpretability
Mathematical Mapping
Hidden Layer Analysis
🔎 Similar Papers
No similar papers found.
Z
Zakaria Patel
Ecomtent & Department of Computer Science, University of Toronto
S
Sebastian J. Wetzel
University of Waterloo, Waterloo, Ontario N2L 3G1, Canada, Perimeter Institute for Theoretical Physics, Waterloo, Ontario N2L 2Y5, Canada, Homes Plus Magazine Inc., Waterloo, Ontario N2V 2B1, Canada