PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents

๐Ÿ“… 2026-03-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the scarcity of digital footprint data, which hinders behavioral modeling and model generalization. To overcome this limitation, the authors propose a large language model (LLM) agent framework grounded in structured user personas, enabling the first systematic generation of high-fidelity, diverse user event sequences along with their corresponding digital artifactsโ€”such as emails, messages, and calendar entries. The approach substantially enhances both the realism and diversity of synthetic data compared to existing baselines. Furthermore, models fine-tuned on the generated data demonstrate markedly improved out-of-distribution generalization on real-world tasks, underscoring the utility of the proposed framework for advancing behavioral AI research.

Technology Category

Application Category

๐Ÿ“ Abstract
Digital footprints (records of individuals' interactions with digital systems) are essential for studying behavior, developing personalized applications, and training machine learning models. However, research in this area is often hindered by the scarcity of diverse and accessible data. To address this limitation, we propose a novel method for synthesizing realistic digital footprints using large language model (LLM) agents. Starting from a structured user profile, our approach generates diverse and plausible sequences of user events, ultimately producing corresponding digital artifacts such as emails, messages, calendar entries, reminders, etc. Intrinsic evaluation results demonstrate that the generated dataset is more diverse and realistic than existing baselines. Moreover, models fine-tuned on our synthetic data outperform those trained on other synthetic datasets when evaluated on real-world out-of-distribution tasks.
Problem

Research questions and friction points this paper is trying to address.

digital footprints
data scarcity
LLM agents
synthetic data
user behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

digital footprints
LLM agents
synthetic data generation
user behavior simulation
personalized applications
๐Ÿ”Ž Similar Papers
No similar papers found.
M
Minjia Wang
Apple, Harvard University
Y
Yunfeng Wang
Apple
X
Xiao Ma
Apple
D
Dexin Lv
Apple
Q
Qifan Guo
Apple
L
Lynn Zheng
Apple
B
Benliang Wang
Apple
Lei Wang
Lei Wang
Apple
GPSShadow MatchingSmartphone NavigationIndoor PositioningGNSS
Jiannan Li
Jiannan Li
Assistant Professor, Singapore Management University
human-computer interactionhuman-robot interaction
Y
Yongwei Xing
Apple
D
David Xu
Apple
Z
Zheng Sun
Apple