Does It Make Sense to Speak of Introspection in Large Language Models?

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This paper investigates whether large language models (LLMs) possess a capacity legitimately termed “introspection.” Method: We conduct empirical self-query experiments on two classes of LLM-generated self-reports—creative writing attribution and temperature parameter inference—integrating prompt engineering, behavioral analysis, and philosophical conceptual clarification. Contribution/Results: We introduce and substantiate the notion of “minimal introspection”: LLMs’ accurate inference of their own hyperparameters (e.g., temperature) constitutes an unconscious yet formally valid form of introspection, challenging the traditional view that introspection necessitates consciousness. Building on this, we propose an operational distinction between “quasi-introspection” (behaviorally mimicked self-reporting) and “substantive introspection” (grounded in reliable, stable self-modeling). This framework advances theoretical understanding of LLM self-modeling capabilities and broadens the paradigm for AI self-awareness research.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) exhibit compelling linguistic behaviour, and sometimes offer self-reports, that is to say statements about their own nature, inner workings, or behaviour. In humans, such reports are often attributed to a faculty of introspection and are typically linked to consciousness. This raises the question of how to interpret self-reports produced by LLMs, given their increasing linguistic fluency and cognitive capabilities. To what extent (if any) can the concept of introspection be meaningfully applied to LLMs? Here, we present and critique two examples of apparent introspective self-report from LLMs. In the first example, an LLM attempts to describe the process behind its own"creative"writing, and we argue this is not a valid example of introspection. In the second example, an LLM correctly infers the value of its own temperature parameter, and we argue that this can be legitimately considered a minimal example of introspection, albeit one that is (presumably) not accompanied by conscious experience.

Problem

Research questions and friction points this paper is trying to address.

Can introspection concept apply to LLMs' self-reports?

Do LLMs' self-reports indicate consciousness like humans?

How to interpret introspective claims from linguistically fluent LLMs?

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing LLM self-reports for introspection

Critiquing creative writing introspection claims

Validating temperature parameter introspection example

🔎 Similar Papers

No similar papers found.