On Mechanistic Circuits for Extractive Question-Answering

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the internal mechanisms of large language models (LLMs) in extractive question answering (QA), proposing a forward-only attribution framework that requires no backpropagation. Methodologically, it introduces the first forward-pass-only approach for reliable data attribution in extractive QA—achieving accurate attribution in a single forward pass. We propose ATNNATTRIB, an algorithm integrating causal mediation analysis with mechanistic circuit extraction to systematically identify critical attention heads and MLP modules, revealing their collaborative “circuit” for jointly leveraging parametric memory and externally retrieved context. Key contributions include: (1) state-of-the-art attribution accuracy; (2) millisecond-level real-time attribution capability; and (3) empirical validation that precise contextual dependency—sufficient to guide model outputs—is fully recoverable from forward propagation alone. This establishes a novel paradigm for LLM interpretability and controllability grounded in forward-only mechanistic analysis.

Technology Category

Application Category

📝 Abstract
Large language models are increasingly used to process documents and facilitate question-answering on them. In our paper, we extract mechanistic circuits for this real-world language modeling task: context-augmented language modeling for extractive question-answering (QA) tasks and understand the potential benefits of circuits towards downstream applications such as data attribution to context information. We extract circuits as a function of internal model components (e.g., attention heads, MLPs) using causal mediation analysis techniques. Leveraging the extracted circuits, we first understand the interplay between the model's usage of parametric memory and retrieved context towards a better mechanistic understanding of context-augmented language models. We then identify a small set of attention heads in our circuit which performs reliable data attribution by default, thereby obtaining attribution for free in just the model's forward pass. Using this insight, we then introduce ATTNATTRIB, a fast data attribution algorithm which obtains state-of-the-art attribution results across various extractive QA benchmarks. Finally, we show the possibility to steer the language model towards answering from the context, instead of the parametric memory by using the attribution from ATTNATTRIB as an additional signal during the forward pass. Beyond mechanistic understanding, our paper provides tangible applications of circuits in the form of reliable data attribution and model steering.
Problem

Research questions and friction points this paper is trying to address.

Mechanistic circuits for extractive QA
Data attribution using attention heads
Steering models with ATTNATTRIB algorithm
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts mechanistic circuits using causal mediation.
Introduces ATTNATTRIB for fast data attribution.
Steers models using context via ATTNATTRIB signals.
🔎 Similar Papers
No similar papers found.