Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy

📅 2026-02-19

📈 Citations: 0

✨ Influential: 0

career value

250K/year

🤖 AI Summary

This study addresses the limited understanding of how large language models internally represent tasks of varying cognitive complexity and the absence of an interpretable evaluation framework. It introduces Bloom’s Taxonomy into mechanistic interpretability research for the first time, constructing cognitive-level labels to investigate whether distinct cognitive processes—such as remembering, understanding, and creating—are linearly separable within the model’s residual stream activations. Using linear probing, the authors analyze activation vectors across layers and find that a linear classifier achieves an average accuracy of 95% across all cognitive levels. This indicates that cognitive difficulty is encoded early in the forward pass and becomes increasingly separable with network depth, demonstrating that cognitive complexity exhibits a linearly decodable structure in the model’s internal representations.

Technology Category

Application Category

📝 Abstract

The black-box nature of Large Language Models necessitates novel evaluation frameworks that transcend surface-level performance metrics. This study investigates the internal neural representations of cognitive complexity using Bloom's Taxonomy as a hierarchical lens. By analyzing high-dimensional activation vectors from different LLMs, we probe whether different cognitive levels, ranging from basic recall (Remember) to abstract synthesis (Create), are linearly separable within the model's residual streams. Our results demonstrate that linear classifiers achieve approximately 95% mean accuracy across all Bloom levels, providing strong evidence that cognitive level is encoded in a linearly accessible subspace of the model's representations. These findings provide evidence that the model resolves the cognitive difficulty of a prompt early in the forward pass, with representations becoming increasingly separable across layers.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Cognitive Complexity

Bloom's Taxonomy

Mechanistic Interpretability

Neural Representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mechanistic Interpretability

Bloom's Taxonomy

Linear Probing