The Deleuzian Representation Hypothesis

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the problem that interpretable concept extraction in neural networks heavily relies on sparse autoencoders (SAEs) and lacks theoretical grounding. We propose an unsupervised differential clustering method that formalizes Deleuze’s philosophical principle—“concepts are differences”—into a computational framework: neuron activation differences are modeled via discriminant analysis, and activation skewness weighting is introduced to enhance concept diversity. The method is cross-modal, applicable uniformly to vision, language, and audio tasks. Evaluated on five mainstream models, it yields concepts of significantly higher quality than existing unsupervised SAE variants—approaching supervised baselines—while enabling causal intervention and effective behavioral control of neural models.

Technology Category

Application Category

📝 Abstract

We propose an alternative to sparse autoencoders (SAEs) as a simple and effective unsupervised method for extracting interpretable concepts from neural networks. The core idea is to cluster differences in activations, which we formally justify within a discriminant analysis framework. To enhance the diversity of extracted concepts, we refine the approach by weighting the clustering using the skewness of activations. The method aligns with Deleuze's modern view of concepts as differences. We evaluate the approach across five models and three modalities (vision, language, and audio), measuring concept quality, diversity, and consistency. Our results show that the proposed method achieves concept quality surpassing prior unsupervised SAE variants while approaching supervised baselines, and that the extracted concepts enable steering of a model's inner representations, demonstrating their causal influence on downstream behavior.

Problem

Research questions and friction points this paper is trying to address.

Extracts interpretable concepts from neural networks unsupervised.

Clusters activation differences to enhance concept diversity.

Enables model steering through causal influence on behavior.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clustering activation differences for concept extraction

Weighting clustering with activation skewness for diversity

Evaluating across models and modalities for quality

🔎 Similar Papers

No similar papers found.