PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the scarcity of digital footprint data, which hinders behavioral modeling and model generalization. To overcome this limitation, the authors propose a large language model (LLM) agent framework grounded in structured user personas, enabling the first systematic generation of high-fidelity, diverse user event sequences along with their corresponding digital artifacts—such as emails, messages, and calendar entries. The approach substantially enhances both the realism and diversity of synthetic data compared to existing baselines. Furthermore, models fine-tuned on the generated data demonstrate markedly improved out-of-distribution generalization on real-world tasks, underscoring the utility of the proposed framework for advancing behavioral AI research.

Technology Category

Application Category

📝 Abstract

Digital footprints (records of individuals' interactions with digital systems) are essential for studying behavior, developing personalized applications, and training machine learning models. However, research in this area is often hindered by the scarcity of diverse and accessible data. To address this limitation, we propose a novel method for synthesizing realistic digital footprints using large language model (LLM) agents. Starting from a structured user profile, our approach generates diverse and plausible sequences of user events, ultimately producing corresponding digital artifacts such as emails, messages, calendar entries, reminders, etc. Intrinsic evaluation results demonstrate that the generated dataset is more diverse and realistic than existing baselines. Moreover, models fine-tuned on our synthetic data outperform those trained on other synthetic datasets when evaluated on real-world out-of-distribution tasks.

Problem

Research questions and friction points this paper is trying to address.

digital footprints

data scarcity

LLM agents

synthetic data

user behavior

Innovation

Methods, ideas, or system contributions that make the work stand out.

digital footprints

LLM agents

synthetic data generation