Measuring and Mitigating Persona Distortions from AI Writing Assistance

📅 2026-04-24

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study addresses the systemic distortion of authorial persona induced by AI writing assistants during their widespread adoption, which leads readers to perceive authors as more extreme, more competent, and aligned with privileged groups—thereby influencing public discourse and trust. Through large-scale human–AI interaction experiments, the research quantifies, for the first time, the multidimensional personality distortions introduced by AI-generated text. Leveraging over ten thousand reader evaluations across multiple dimensions, the authors develop a human-feedback-driven reward model to steer AI outputs toward more faithfully representing the author’s genuine stance. While the proposed approach effectively mitigates such distortions, it comes at the cost of reduced user acceptance, revealing an inherent tension between assistive efficacy and expressive authenticity.

Technology Category

Application Category

📝 Abstract

Hundreds of millions of people use artificial intelligence (AI) for writing assistance. Here, we evaluated how AI writing assistance distorts writer personas - their perceived beliefs, personality, and identity. In three large-scale experiments, writers (N=2,939) wrote political opinion paragraphs with and without AI assistance. Separate groups of readers (N=11,091) blindly evaluated these paragraphs across 29 socially salient dimensions of reader perception, spanning political opinion, writing quality, writer personality, emotions, and demographics. AI writing assistance produced persona distortions across all dimensions: with AI, writers seemed more opinionated, competent, and positive, and their perceived demographic profile shifted towards more privileged groups. Writers objected to many of the observed distortions, yet continued to prefer AI-assisted text even when made aware of them. We successfully mitigated objectionable persona distortions at the model level by training reward models on our experimental data (10,008 paragraphs, 2,903,596 ratings) to steer AI outputs towards faithful representation of writer stance. However, this came at a cost to user acceptance, suggesting an entanglement between desirable and undesirable properties of AI writing assistance that may be difficult to resolve. Together, our findings demonstrate that persona distortions from AI writing assistance are pervasive and persistent even under realistic conditions of human oversight, which carries implications for public discourse, trust, and democratic deliberation that scale with AI adoption.

Problem

Research questions and friction points this paper is trying to address.

persona distortion

AI writing assistance

writer identity

perception bias

human-AI collaboration

Innovation

Methods, ideas, or system contributions that make the work stand out.

persona distortion

AI writing assistance

reward modeling