Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

📅 2024-09-30
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the insufficient robustness of concept representations in large language models (LLMs). We propose the Gaussian Concept Subspace (GCS) framework, which abandons the conventional linear probe assumption that concepts are represented by single vectors, instead modeling concepts as structured subspaces within the model’s latent space. GCS represents each concept as a joint Gaussian distribution parameterized by its mean and covariance matrix, enabling principled subspace confidence estimation and representation-level interventions—e.g., emotion steering. This yields improved semantic stability and conceptual completeness. Extensive experiments across multiple LLM scales and architectures demonstrate that GCS achieves higher faithfulness and plausibility than baseline methods. In emotion-guided generation tasks, GCS simultaneously enhances control precision and text fluency, enabling more controllable and robust concept-level interventions.

Technology Category

Application Category

📝 Abstract
Probing learned concepts in large language models (LLMs) is crucial for understanding how semantic knowledge is encoded internally. Training linear classifiers on probing tasks is a principle approach to denote the vector of a certain concept in the representation space. However, the single vector identified for a concept varies with both data and training, making it less robust and weakening its effectiveness in real-world applications. To address this challenge, we propose an approach to approximate the subspace representing a specific concept. Built on linear probing classifiers, we extend the concept vectors into Gaussian Concept Subspace (GCS). We demonstrate GCS's effectiveness through measuring its faithfulness and plausibility across multiple LLMs with different sizes and architectures. Additionally, we use representation intervention tasks to showcase its efficacy in real-world applications such as emotion steering. Experimental results indicate that GCS concept vectors have the potential to balance steering performance and maintaining the fluency in natural language generation tasks.
Problem

Research questions and friction points this paper is trying to address.

Modeling concept subspaces in LLMs for robust semantic representation.
Extending single concept vectors to Gaussian Concept Subspace (GCS).
Enhancing real-world applications like emotion steering in LLMs.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends concept vectors into Gaussian Concept Subspace
Measures GCS faithfulness across various LLMs
Applies GCS in emotion steering tasks
🔎 Similar Papers