Explaining the Explainer: Understanding the Inner Workings of Transformer-based Symbolic Regression Models

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Transformers have demonstrated strong performance in symbolic regression, yet the causal mechanisms by which they generate mathematical expressions remain unclear. This work proposes PATCHES, an algorithm that achieves circuit-level interpretability in symbolic regression for the first time. By combining evolutionary search to identify compact and functionally correct internal circuits with causal interventions—such as mean ablation—direct logit attribution, and probing classifiers, the method establishes a rigorous evaluation framework centered on faithfulness, completeness, and minimality. Experiments successfully recover 28 valid circuits, underscoring the high potential of symbolic regression as a testbed for mechanistic interpretability and establishing a reliable methodology for circuit discovery.

Technology Category

Application Category

📝 Abstract

Following their success across many domains, transformers have also proven effective for symbolic regression (SR); however, the internal mechanisms underlying their generation of mathematical operators remain largely unexplored. Although mechanistic interpretability has successfully identified circuits in language and vision models, it has not yet been applied to SR. In this article, we introduce PATCHES, an evolutionary circuit discovery algorithm that identifies compact and correct circuits for SR. Using PATCHES, we isolate 28 circuits, providing the first circuit-level characterisation of an SR transformer. We validate these findings through a robust causal evaluation framework based on key notions such as faithfulness, completeness, and minimality. Our analysis shows that mean patching with performance-based evaluation most reliably isolates functionally correct circuits. In contrast, we demonstrate that direct logit attribution and probing classifiers primarily capture correlational features rather than causal ones, limiting their utility for circuit discovery. Overall, these results establish SR as a high-potential application domain for mechanistic interpretability and propose a principled methodology for circuit discovery.

Problem

Research questions and friction points this paper is trying to address.

symbolic regression

transformer

mechanistic interpretability

circuit discovery

mathematical expression generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

mechanistic interpretability

symbolic regression

circuit discovery