🤖 AI Summary
This work proposes Narriva, a novel approach to modeling user privacy decisions that simultaneously achieves individual accuracy, prompt usability, token efficiency, and population representativeness—challenges that existing methods struggle to balance. Narriva uniquely integrates large-scale empirical privacy survey data with established privacy theories to construct structured, highly readable, and token-efficient textual synthetic personas using large language models. By leveraging a theory-driven framework combined with data compression techniques, the method enables effective in-context learning and behavior prediction. Evaluated across five datasets, Narriva attains up to 88% prediction accuracy while reducing prompt token usage by 80–95% compared to baselines. Furthermore, it successfully replicates aggregate privacy behavior distributions across studies (TVComplement = 0.85) without relying on demographic stereotypes.
📝 Abstract
The ability to simulate human privacy decisions has significant implications for aligning autonomous agents with individual intent and conducting cost-effective, large-scale privacy-centric user studies. Prior approaches prompt Large Language Models (LLMs) with natural language user statements, data-sharing histories, or demographic attributes to simulate privacy decisions. These approaches, however, fail to balance individual-level accuracy, prompt usability, token efficiency, and population-level representation. We present Narriva, an approach that generates text-based synthetic privacy personas to address these shortcomings. Narriva grounds persona generation in prior user privacy decisions, such as those from large-scale survey datasets, rather than purely relying on demographic stereotypes. It compresses this data into concise, human-readable summaries structured by established privacy theories. Through benchmarking across five diverse datasets, we analyze the characteristics of Narriva's synthetic personas in modeling both individual and population-level privacy preferences. We find that grounding personas in past privacy behaviors achieves up to 88% predictive accuracy (significantly outperforming a non-personalized LLM baseline), and yields an 80-95% reduction in prompt tokens compared to in-context learning with raw examples. Finally, we demonstrate that personas synthesized from a single survey can reproduce the aggregate privacy behaviors and statistical distributions (TVComplement up to 0.85) of entirely different studies.