Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Large language models (LLMs) exhibit poor interpretability of internal representations; existing attribution methods—e.g., direct logit attribution and sparse autoencoders—are hampered by lexical ambiguity and ill-defined feature semantics. Method: We propose the Hypervector Probe, a novel neuro-symbolic probing framework that integrates Vector Symbolic Architecture (VSA) with neural probing to project residual stream activations into semantically grounded, structurally traceable symbolic concepts. Contribution/Results: Unlike conventional probes reliant on predefined labels or vocabulary, our method enables stable, cross-model, cross-embedding-dimension, and cross-task concept extraction without supervision. It supports failure mode identification and enables fine-grained, end-to-end state tracking across generation. Empirical evaluation across multiple LLMs demonstrates its effectiveness in decoding hierarchical semantic and syntactic structures, achieving strong explanatory power and robustness in question answering and controlled reasoning tasks.

Technology Category

Application Category

📝 Abstract

Despite their capabilities, Large Language Models (LLMs) remain opaque with limited understanding of their internal representations. Current interpretability methods, such as direct logit attribution (DLA) and sparse autoencoders (SAEs), provide restricted insight due to limitations such as the model's output vocabulary or unclear feature names. This work introduces Hyperdimensional Probe, a novel paradigm for decoding information from the LLM vector space. It combines ideas from symbolic representations and neural probing to project the model's residual stream into interpretable concepts via Vector Symbolic Architectures (VSAs). This probe combines the strengths of SAEs and conventional probes while overcoming their key limitations. We validate our decoding paradigm with controlled input-completion tasks, probing the model's final state before next-token prediction on inputs spanning syntactic pattern recognition, key-value associations, and abstract inference. We further assess it in a question-answering setting, examining the state of the model both before and after text generation. Our experiments show that our probe reliably extracts meaningful concepts across varied LLMs, embedding sizes, and input domains, also helping identify LLM failures. Our work advances information decoding in LLM vector space, enabling extracting more informative, interpretable, and structured features from neural representations.

Problem

Research questions and friction points this paper is trying to address.

Decoding opaque internal representations of Large Language Models

Overcoming vocabulary limitations in current interpretability methods

Extracting structured concepts from neural representations via symbolic projection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Projects residual stream into interpretable concepts

Uses Vector Symbolic Architectures for decoding

Combines strengths of SAEs and conventional probes

🔎 Similar Papers

No similar papers found.