IBCircuit: Towards Holistic Circuit Discovery with Information Bottleneck

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing circuit discovery methods often overlook the holistic nature of model computation and rely on manually designed, task-specific activation perturbations, resulting in low efficiency and poor generalization. This work proposes IB-Circuit, the first approach to integrate the information bottleneck principle into circuit discovery, establishing an end-to-end optimization framework that automatically identifies the minimal faithful computational subgraph within language models that is critical for a given target task. By eliminating the need for task-customized perturbations, IB-Circuit enables general-purpose and efficient circuit extraction. Experiments on the IOI and Greater-Than tasks demonstrate that the circuits identified by IB-Circuit are significantly more compact and faithful, outperforming existing methods by a substantial margin.

Technology Category

Application Category

📝 Abstract
Circuit discovery has recently attracted attention as a potential research direction to explain the non-trivial behaviors of language models. It aims to find the computational subgraphs, also known as circuits, within the model that are responsible for solving specific tasks. However, most existing studies overlook the holistic nature of these circuits and require designing specific corrupted activations for different tasks, which is inaccurate and inefficient. In this work, we propose an end-to-end approach based on the principle of Information Bottleneck, called IBCircuit, to identify informative circuits holistically. IBCircuit is an optimization framework for holistic circuit discovery and can be applied to any given task without tediously corrupted activation design. In both the Indirect Object Identification (IOI) and Greater-Than tasks, IBCircuit identifies more faithful and minimal circuits in terms of critical node components and edge components compared to recent related work.
Problem

Research questions and friction points this paper is trying to address.

circuit discovery
information bottleneck
language models
computational subgraphs
holistic circuits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Information Bottleneck
Circuit Discovery
Holistic Interpretability
Language Models
End-to-End Optimization
🔎 Similar Papers
No similar papers found.