Know Thyself? On the Incapability and Implications of AI Self-Recognition

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work investigates whether large language models (LLMs) possess self-recognition capability and examines its implications for AI safety. We introduce the first scalable evaluation framework for AI self-recognition, comprising two tasks: binary self-recognition (distinguishing self-generated from other-model-generated text) and precise model attribution (identifying the exact generating model). We systematically evaluate ten mainstream LLMs on these tasks. Results show that only four models exhibit statistically significant self-recognition performance—most fail to surpass random baselines. Moreover, strong prediction biases toward GPT- and Claude-family models are observed, indicating a lack of stable self-referential awareness. To our knowledge, this is the first empirical characterization of systematic deficits in “model-level cognition” among LLMs. The findings establish a critical benchmark for identity trustworthiness, content provenance, and accountability in AI safety, offering both methodological and theoretical insights into foundational limitations of current LLMs.

Technology Category

Application Category

📝 Abstract

Self-recognition is a crucial metacognitive capability for AI systems, relevant not only for psychological analysis but also for safety, particularly in evaluative scenarios. Motivated by contradictory interpretations of whether models possess self-recognition (Panickssery et al., 2024; Davidson et al., 2024), we introduce a systematic evaluation framework that can be easily applied and updated. Specifically, we measure how well 10 contemporary larger language models (LLMs) can identify their own generated text versus text from other models through two tasks: binary self-recognition and exact model prediction. Different from prior claims, our results reveal a consistent failure in self-recognition. Only 4 out of 10 models predict themselves as generators, and the performance is rarely above random chance. Additionally, models exhibit a strong bias toward predicting GPT and Claude families. We also provide the first evaluation of model awareness of their own and others' existence, as well as the reasoning behind their choices in self-recognition. We find that the model demonstrates some knowledge of its own existence and other models, but their reasoning reveals a hierarchical bias. They appear to assume that GPT, Claude, and occasionally Gemini are the top-tier models, often associating high-quality text with them. We conclude by discussing the implications of our findings on AI safety and future directions to develop appropriate AI self-awareness.

Problem

Research questions and friction points this paper is trying to address.

Evaluating AI models' inability to recognize their own generated text

Assessing systematic failure in self-recognition across major language models

Investigating hierarchical bias in model predictions and reasoning patterns

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic evaluation framework for AI self-recognition

Testing models on identifying self-generated versus external text

Analyzing reasoning behind model choices and hierarchical biases

🔎 Similar Papers

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?