Position-aware Automatic Circuit Discovery

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing circuit discovery methods assume positional invariance of model components, rendering them inadequate for characterizing position-dependent mechanisms in language models—especially under variable-length input tasks. This work introduces position sensitivity into circuit discovery for the first time, proposing the “position-aware circuit” paradigm. We formally define and automatically generate a schema supporting cross-positional semantic alignment. Our end-to-end automated framework integrates gradient-driven edge attribution patching, LLM-assisted schema construction, and position-discriminative computational graph pruning. Empirically, our method achieves significant circuit compression while preserving high fidelity, outperforming conventional position-agnostic approaches across multiple mechanistic interpretability benchmarks. It enables finer-grained, more generalizable analysis of neural network behavior, advancing the state of explainable AI.

Technology Category

Application Category

📝 Abstract
A widely used strategy to discover and understand language model mechanisms is circuit analysis. A circuit is a minimal subgraph of a model's computation graph that executes a specific task. We identify a gap in existing circuit discovery methods: they assume circuits are position-invariant, treating model components as equally relevant across input positions. This limits their ability to capture cross-positional interactions or mechanisms that vary across positions. To address this gap, we propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples. First, we extend edge attribution patching, a gradient-based method for circuit discovery, to differentiate between token positions. Second, we introduce the concept of a dataset schema, which defines token spans with similar semantics across examples, enabling position-aware circuit discovery in datasets with variable length examples. We additionally develop an automated pipeline for schema generation and application using large language models. Our approach enables fully automated discovery of position-sensitive circuits, yielding better trade-offs between circuit size and faithfulness compared to prior work.
Problem

Research questions and friction points this paper is trying to address.

Enhances circuit discovery with position-aware methods.
Addresses limitations in capturing cross-positional model interactions.
Automates schema generation for variable-length dataset analysis.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Position-aware circuit discovery
Edge attribution patching
Dataset schema automation
🔎 Similar Papers
No similar papers found.