From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

257K/year

🤖 AI Summary

Current approaches to interpreting neural network mechanisms lack a unified formal representation, hindering comparison, reuse, and theoretical analysis. This work proposes a two-layer signature framework: the upper layer employs Causal Functional Signatures (CFS) to characterize computational strategies—such as attention copying and MLP binding—while the lower layer integrates structural invariance with Inductive Logic Programming (ILP) to learn architecture signatures in the form of θ-clauses. This framework enables, for the first time, explicit articulation of mechanistic claims alongside cross-model comparability and cross-scale transferability. Experimental results demonstrate that ILP-derived signatures significantly outperform graph kernel and feature-vector baselines in structural discriminability and facilitate mechanism transfer across models of varying scales and architectures.

📝 Abstract

Mechanistic interpretability produces circuit-level causal analyses of neural network behaviour, but discovered circuits often remain isolated experimental artefacts: there is no shared formal representation for what circuits compute, how they relate, or when two findings provide evidence for the same mechanism. This work provides a formal infrastructure for cumulative mechanistic science by treating circuit interpretation as inductive theory construction. Each circuit is characterised at two levels: a Causal Functional Signature (CFS), which grounds component behaviour in causal attribution evidence and token role profiles, and an architectural signature $τ_{\mathrm{arch}}$, learned by inductive logic programming (ILP) from scale-invariant structural predicates. Together, these constitute a formal coherence layer that makes mechanistic claims explicit, comparable via $θ$-subsumption, and portable across model scales. CFS reveals qualitatively distinct computational strategies across task types, including attention-mediated copying versus MLP-mediated binding. ILP signatures achieve substantially better structural separation than graph kernel and feature-vector baselines, and support principled transfer across model scales and architecture families.

Problem

Research questions and friction points this paper is trying to address.

mechanistic interpretability

circuit representation

formal theory

inductive logic

neural network mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

inductive logic programming

mechanistic interpretability

causal functional signature