🤖 AI Summary
This work challenges the assumption of functional anisotropy in large language models—the notion that a given task is implemented by a unique or nearly unique internal mechanism. To this end, the authors propose an overlap-aware layer repulsion method that explicitly penalizes structural overlap during circuit and layer discovery, integrating multi-round discovery, hypersparse analysis, and modeling of high-dimensional superposition states. Empirically, they identify multiple high-performance, low-overlap equivalent functional pathways (sheaves), demonstrating that in trilateral hypersparse structures, no single component is indispensable. This constitutes the first systematic validation of the non-uniqueness of mechanistic explanations and provides strong support for the “distributed dense circuits” hypothesis: a single task can be realized by numerous structurally distinct mechanisms, none requiring a core component, yet collectively satisfying faithfulness, sparsity, and completeness.
📝 Abstract
In this paper, we present empirical and theoretical evidence against a central but largely implicit assumption in circuit and sheaf discovery (CSD), which we term the Functional Anisotropy Hypothesis: the idea that functions in large language models (LLMs) are localised to a unique or near-unique internal mechanism. We show that a single LLM task can instead be supported by multiple, structurally distinct circuits or sheaves that are simultaneously faithful, sparse, and complete. To systematically uncover such competing mechanisms, we introduce Overlap-Aware Sheaf Repulsion, a method that augments the CSD objective with an explicit penalty on structural overlap across multiple discovery runs, enabling the discovery of circuits or sheaves with strong task performance but minimal shared structure across a plethora of common CSD benchmarks. We find that this phenomenon becomes increasingly pronounced as the number of discovered sheaves grows and persists robustly across major CSD methods. We further identify an ultra-sparse three-edge sheaf and show that none of its edges is individually indispensable, undermining even weakened notions of canonical or essential components. To explain these findings, we propose a Distributive Dense Circuit Hypothesis and provide a theoretical analysis demonstrating that non-unique, low-overlap circuit explanations arise naturally from high-dimensional superposition under mild assumptions. Together, our results suggest that mechanistic explanations in LLMs are inherently non-canonical and call for a rethinking of how CSD results should be interpreted and evaluated.