🤖 AI Summary
This work addresses the challenge that patient identities in psychiatric narratives are not only disclosed through explicit identifiers but also implicitly embedded in personalized life events and clinical structures, rendering conventional de-identification methods inadequate in balancing semantic fidelity and privacy preservation. To this end, the authors propose a graph-guided semantic rewriting framework: first, a semantic graph is constructed to represent clinical entities, temporal anchors, and their interrelationships; then, graph-constrained perturbations are applied to preserve critical diagnostic structures; finally, a graph-conditioned large language model generates de-identified text. Evaluated on 90 clinical narratives, this approach—marking the first integration of structured semantic graphs into psychiatric text privacy—significantly reduces re-identification risk and semantic distortion compared to pure LLM baselines, while maintaining high diagnostic fidelity and enabling fine-grained control over retained and modified content.
📝 Abstract
Psychiatric narratives encode patient identity not only through explicit identifiers but also through idiosyncratic life events embedded in their clinical structure. Existing de-identification approaches, including PHI masking and LLM-based synthetic rewriting, operate at the text level and offer limited control over which semantic elements are preserved or altered. We introduce Anonpsy, a de-identification framework that reformulates the task as graph-guided semantic rewriting. Anonpsy (1) converts each narrative into a semantic graph encoding clinical entities, temporal anchors, and typed relations; (2) applies graph-constrained perturbations that modify identifying context while preserving clinically essential structure; and (3) regenerates text via graph-conditioned LLM generation. Evaluated on 90 clinician-authored psychiatric case narratives, Anonpsy preserves diagnostic fidelity while achieving consistently low re-identification risk under expert, semantic, and GPT-5-based evaluations. Compared with a strong LLM-only rewriting baseline, Anonpsy yields substantially lower semantic similarity and identifiability. These results demonstrate that explicit structural representations combined with constrained generation provide an effective approach to de-identification for psychiatric narratives.