Privileged Self-Access Matters for Introspection in AI

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This paper addresses the fundamental question of whether AI models possess genuine introspective capability. To resolve ambiguities in existing definitions, the authors propose a rigorous “thick introspection” criterion: an agent must access its internal states with computational cost no greater than that of a third-party observer and with higher reliability. Leveraging this definition, the study empirically evaluates large language models’ (LLMs) ability to self-report their internal temperature parameter—a canonical introspective task. Results show that while LLMs exhibit superficial introspective behavior under weaker definitions, their self-reports are significantly less reliable than external measurements and fail to satisfy the joint cost–reliability constraint. This work clarifies the conceptual boundaries of introspection, establishes the first quantitative benchmark for self-referential reasoning in LLMs, and demonstrates—via empirically grounded criteria—that current LLMs lack substantive introspective capacity. It thus provides both a theoretical framework and empirical foundation for advancing research on AI self-awareness.

Technology Category

Application Category

📝 Abstract

Whether AI models can introspect is an increasingly important practical question. But there is no consensus on how introspection is to be defined. Beginning from a recently proposed ''lightweight'' definition, we argue instead for a thicker one. According to our proposal, introspection in AI is any process which yields information about internal states through a process more reliable than one with equal or lower computational cost available to a third party. Using experiments where LLMs reason about their internal temperature parameters, we show they can appear to have lightweight introspection while failing to meaningfully introspect per our proposed definition.

Problem

Research questions and friction points this paper is trying to address.

Defining introspection in AI systems

Evaluating reliability of internal state information

Assessing computational cost advantages in introspection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Defining introspection via privileged access

Comparing reliability against third-party methods

Testing introspection through temperature parameter experiments

🔎 Similar Papers

No similar papers found.