Measuring Embedding Sensitivity to Authorial Style in French: Comparing Literary Texts with Language Model Rewritings

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This study investigates the extent to which authorial style information is preserved and detectable in embeddings of texts rewritten by large language models (LLMs). Leveraging a controlled French literary dataset, the authors systematically evaluate differences in stylistic signals between original and LLM-rewritten texts through embedding space analysis and dispersion metrics. The work provides the first evidence in French that embeddings effectively capture author-specific stylistic features, which remain significantly present even after LLM rewriting, while also exhibiting an additional layer of model-specific generation patterns. These findings offer novel insights and a technical foundation for detecting author imitation in machine-generated text.

📝 Abstract

Large language models (LLMs) can convincingly imitate human writing styles, yet it remains unclear how much stylistic information is encoded in embeddings from any language model and retained after LLM rewriting. We investigate these questions in French, using a controlled literary dataset to quantify the effect of stylistic variation via changes in embedding dispersion. We observe that embeddings reliably capture authorial stylistic features and that these signals persist after rewriting, while also exhibiting LLM-specific patterns. These analytical results offer promising directions for authorship imitation detection in the era of language models.

Problem

Research questions and friction points this paper is trying to address.

authorial style

embedding sensitivity

language model rewriting

stylistic variation

French literary texts

Innovation

Methods, ideas, or system contributions that make the work stand out.

embedding sensitivity

authorial style

language model rewriting