Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing sparse autoencoders rely on the activation coverage assumption, often yielding spurious hierarchical relationships that lack semantic relevance and struggle to accurately capture the structured characteristics of real-world data. This work proposes Tree SAE, a novel model that introduces a reconstruction consistency constraint operating in concert with activation coverage to enforce functionally coherent tree-structured dependencies among features across different layers. This approach substantially reduces hierarchical misassignments and, for the first time, enables an interpretable mapping of the geometric structure of sub-feature subspaces within large language models. Experimental results demonstrate that Tree SAE significantly outperforms current methods in hierarchical feature learning while maintaining performance on key benchmark tasks comparable to state-of-the-art models.

📝 Abstract

Learning hierarchical features in Sparse Autoencoders (SAEs) is essential for capturing the structured nature of real-world data and mitigating issues like feature absorption or splitting. Existing works attempt to identify hierarchical relationships within independent feature sets by relying on activation coverage, the assumption that child feature should only activate when its parent feature activates. However, we demonstrate that this condition alone is insufficient; that is, it often produces false positives where parent and child concepts are semantically unrelated. To address this, we introduce a novel reconstruction condition that enforces a deeper functional link between hierarchical levels. By combining both activation and reconstruction constraints, we propose the Tree SAE, a model designed to learn hierarchical structures directly from within the feature set. Our results demonstrate that Tree SAEs significantly surpass the existing SAEs at learning hierarchical pairs while maintaining competitive performance to the state-of-the-art on several key benchmarks. Finally, we demonstrate the practical utility of our Tree SAE in mapping the geometry of child feature subspaces and uncovering the complex hierarchical concept structures encoded within large language models.

Problem

Research questions and friction points this paper is trying to address.

Sparse Autoencoders

Hierarchical Features

Feature Absorption

Activation Coverage

Reconstruction Condition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Autoencoders

Hierarchical Features

Reconstruction Condition