Skill Path: Unveiling Language Skills from Circuit Graphs

📅 2024-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing circuit discovery methods suffer from two key limitations: atomic ablation disrupts causal dependencies among connected components and fails to adequately control for confounding effects, leading to inaccurate skill attribution. To address this, we propose “skill paths”—a linear, disentangled representation of language skills that enables precise isolation of individual skills from circuit graphs. We introduce a three-stage extraction framework—decomposition, pruning, and causal mediation—that achieves the first full-chain linear decomposition of Transformers and enables hierarchical skill validation. Leveraging counterfactual reasoning and causal mediation estimation, we successfully extract three fundamental skill paths—preceding-token prediction, inductive reasoning, and in-context learning—across multiple models. Empirical results demonstrate their hierarchical structure and compositional inclusivity, significantly improving both the accuracy and interpretability of skill attribution.

Technology Category

Application Category

📝 Abstract
Circuit graph discovery has emerged as a fundamental approach to elucidating the skill mechanistic of language models. Despite the output faithfulness of circuit graphs, they suffer from atomic ablation, which causes the loss of causal dependencies between connected components. In addition, their discovery process, designed to preserve output faithfulness, inadvertently captures extraneous effects other than an isolated target skill. To alleviate these challenges, we introduce skill paths, which offers a more refined and compact representation by isolating individual skills within a linear chain of components. To enable skill path extracting from circuit graphs, we propose a three-step framework, consisting of decomposition, pruning, and post-pruning causal mediation. In particular, we offer a complete linear decomposition of the transformer model which leads to a disentangled computation graph. After pruning, we further adopt causal analysis techniques, including counterfactuals and interventions, to extract the final skill paths from the circuit graph. To underscore the significance of skill paths, we investigate three generic language skills-Previous Token Skill, Induction Skill, and In-Context Learning Skill-using our framework. Experiments support two crucial properties of these skills, namely stratification and inclusiveness.
Problem

Research questions and friction points this paper is trying to address.

Circuit graphs lose causal dependencies during atomic ablation
Circuit graphs capture extraneous effects beyond target skills
Skill paths isolate individual skills through linear component chains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Skill paths isolate skills in linear chains
Three-step framework decomposes and prunes circuits
Causal mediation extracts paths using counterfactuals interventions
🔎 Similar Papers
No similar papers found.
H
Hang Chen
School of Computer Science and Technology, Xi’an Jiaotong University
J
Jiaying Zhu
School of Computer Science and Engineering, The Chinese University of Hong Kong
X
Xinyu Yang
School of Computer Science and Technology, Xi’an Jiaotong University
Wenya Wang
Wenya Wang
Nanyang Technological University
Deep LearningKnowledge ReasoningNatural Language ProcessingSentiment Analysis