Reinforcing privacy reasoning in LLMs via normative simulacra from fiction

📅 2026-04-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This work addresses the challenge that large language models struggle to align with users’ context-specific privacy expectations due to a lack of normative reasoning grounded in situational context. To overcome this limitation, the authors introduce a novel cross-domain “normative simulacra” dataset constructed from fictional narratives and propose a hybrid training approach combining supervised fine-tuning (SFT) with GRPO-based reinforcement learning. Their method incorporates a procedural reward function and a contrastive scoring mechanism derived from parallel “correct” and “incorrect” normative universes to instill contextually grounded privacy reasoning guided by situational integrity principles. Experimental results across five benchmarks demonstrate that the proposed approach substantially outperforms models trained solely with SFT or conventional architectures, achieving notable improvements in both legal compliance and alignment with human privacy expectations.

Technology Category

Application Category

📝 Abstract
Information handling practices of LLM agents are broadly misaligned with the contextual privacy expectations of their users. Contextual Integrity (CI) provides a principled framework, defining privacy as the appropriate flow of information within context-relative norms. However, existing approaches either double inference cost via supervisor-assistant architectures, or fine-tune on narrow task-specific data. We propose extracting normative simulacra (structured representations of norms and information flows) from fiction novels and using them to fine-tune LLMs via supervised learning followed by GRPO reinforcement learning. Our composite reward function combines programmatic signals, including task clarity (subsuming schema validity, construct discrimination, and extraction confidence), structural completeness, internal consistency, and context identification, with an LLM judge that evaluates whether the model's privacy reasoning is grounded in the held-out normative universe of the source text. To mitigate overfitting, we introduce per-completion contrastive scoring: each completion is evaluated against both the correct normative universe and a randomly selected wrong one, teaching the model to condition on context rather than memorize source-specific norms. We evaluate on five CI-aligned benchmarks spanning distinct societal contexts and ablate the contributions of RL and normative grounding. Across seven models, SFT introduces a conservative prior toward restricting information flow, improving recognition of privacy-relevant situations but not the correctness of privacy judgments. GRPO with normative grounding achieves the highest score on a law compliance benchmark and strongest correlation with crowdsourced human privacy expectations, demonstrating that fiction-derived normative simulacra can teach contextual privacy reasoning that transfers to real-world domains.
Problem

Research questions and friction points this paper is trying to address.

Contextual Integrity
privacy reasoning
normative simulacra
large language models
information flow
Innovation

Methods, ideas, or system contributions that make the work stand out.

normative simulacra
contextual integrity
GRPO reinforcement learning
contrastive scoring
privacy reasoning
🔎 Similar Papers