A framework for analyzing concept representations in neural models

📅 2026-05-02

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the lack of a unified evaluation framework for subspace representations of human-interpretable concepts in neural models. The authors propose a dual-axis framework grounded in “inclusiveness” and “disentanglement” to systematically integrate and compare five subspace estimation methods—including linear subspace modeling, concept probing, and LEACE—through cross-task empirical analyses on both text and speech models such as HuBERT. Their findings reveal the non-uniqueness of concept subspaces and demonstrate that the choice of estimator substantially influences subspace properties. While LEACE excels across both axes, its generalization remains limited. Moreover, phonetic information in HuBERT can be effectively captured and disentangled, whereas speaker-related information resists compact representation.

📝 Abstract

Understanding how neural models represent human-interpretable concepts is challenging. Prior work has explored linear concept subspaces from diverse perspectives, such as probing and concept erasure. We introduce a unified framework to study these subspaces along two axes: \textit{containment}, which tests if a concept is fully represented in a subspace but not outside it, and \textit{disentanglement}, which tests for isolation from other concepts. In experiments on both text and speech models, we first highlight that concept subspaces may not be uniquely determined, and discuss the implications for concept subspace analysis. Then, we compare properties of concept subspaces estimated using five estimators, proposed in different communities. We find that (1) the choice of estimator impacts the containment and disentanglement properties; (2) the state-of-the-art concept erasure method, LEACE, performs well on both testing axes, but still struggles to generalize to unseen data; and (3) in HuBERT speech representations, phone information is both contained and disentangled from speaker information, while speaker information is hard to contain in a compact subspace, despite being disentangled from phones.

Problem

Research questions and friction points this paper is trying to address.

concept representation

neural models

containment

disentanglement

concept subspaces

Innovation

Methods, ideas, or system contributions that make the work stand out.

concept subspace

containment

disentanglement