Individual Turing Test: A Case Study of LLM-based Simulation Using Longitudinal Personal Data

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

This study investigates whether large language models can authentically simulate the linguistic and behavioral characteristics of specific individuals using long-term personal data. Leveraging a decade of private messaging data from volunteers, the authors develop an individual simulation system integrating fine-tuning, retrieval-augmented generation (RAG), memory mechanisms, and hybrid strategies. They further introduce, for the first time, an “Individual Turing Test” evaluation framework, wherein responses are judged for authenticity by people personally acquainted with the target individual. The findings reveal a fundamental trade-off between parametric and non-parametric approaches in individual modeling: while current methods perform adequately in evaluations by strangers, they fail to pass the more stringent test by acquaintances. Fine-tuning excels at capturing stylistic nuances, whereas RAG and memory-based mechanisms demonstrate superior performance in addressing questions involving personal preferences.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable human-like capabilities, yet their ability to replicate a specific individual remains under-explored. This paper presents a case study to investigate LLM-based individual simulation with a volunteer-contributed archive of private messaging history spanning over ten years. Based on the messaging data, we propose the "Individual Turing Test" to evaluate whether acquaintances of the volunteer can correctly identify which response in a multi-candidate pool most plausibly comes from the volunteer. We investigate prevalent LLM-based individual simulation approaches including: fine-tuning, retrieval-augmented generation (RAG), memory-based approach, and hybrid methods that integrate fine-tuning and RAG or memory. Empirical results show that current LLM-based simulation methods do not pass the Individual Turing Test, but they perform substantially better when the same test is conducted on strangers to the target individual. Additionally, while fine-tuning improves the simulation in daily chats representing the language style of the individual, retrieval-augmented and memory-based approaches demonstrate stronger performance on questions involving personal opinions and preferences. These findings reveal a fundamental trade-off between parametric and non-parametric approaches to individual simulation with LLMs when given a longitudinal context.

Problem

Research questions and friction points this paper is trying to address.

Individual Turing Test

LLM-based simulation

longitudinal personal data

individual replication

personalized language modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Individual Turing Test

LLM-based individual simulation

retrieval-augmented generation