The Geometries of Truth Are Orthogonal Across Tasks

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work challenges the cross-task transferability of “truth geometry”—the geometric structure in LLM representations that discriminates correct from incorrect answers via reasoning activations. We systematically evaluate this hypothesis using linear probing, sparse regularization, multi-task mixture probes, and cross-task activation similarity metrics (cosine similarity and support-set overlap) across multiple benchmark tasks. Results reveal near-orthogonal activation patterns for correct answers across tasks: task-specific probes achieve >90% accuracy, yet cross-task transfer performance drops to ~50%—at-chance level; sparse probes exhibit <2% support-set overlap. This constitutes the first empirical demonstration of strong task specificity and orthogonality in truth geometry, falsifying the feasibility of universal confidence calibration. We attribute this to cluster-wise separation of task-specific activation subspaces.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated impressive generalization capabilities across various tasks, but their claim to practical relevance is still mired by concerns on their reliability. Recent works have proposed examining the activations produced by an LLM at inference time to assess whether its answer to a question is correct. Some works claim that a"geometry of truth"can be learned from examples, in the sense that the activations that generate correct answers can be distinguished from those leading to mistakes with a linear classifier. In this work, we underline a limitation of these approaches: we observe that these"geometries of truth"are intrinsically task-dependent and fail to transfer across tasks. More precisely, we show that linear classifiers trained across distinct tasks share little similarity and, when trained with sparsity-enforcing regularizers, have almost disjoint supports. We show that more sophisticated approaches (e.g., using mixtures of probes and tasks) fail to overcome this limitation, likely because activation vectors commonly used to classify answers form clearly separated clusters when examined across tasks.

Problem

Research questions and friction points this paper is trying to address.

LLM truth geometries vary across tasks, limiting reliability assessments

Linear classifiers for truth detection fail to transfer between tasks

Activation clusters remain separated across tasks despite advanced methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing task-dependent truth geometries in LLMs

Using linear classifiers to distinguish correct activations

Exploring mixtures of probes for cross-task reliability

🔎 Similar Papers

On the Universal Truthfulness Hyperplane Inside LLMs