LLM one-shot style transfer for Authorship Attribution and Verification

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing computational stylometry methods are vulnerable to spurious data correlations, struggle to disentangle writing style from topic content, and underutilize large language models (LLMs) for authorship analysis. To address these limitations, we propose an unsupervised, style-transferable metric leveraging LLMs’ causal language modeling pretraining and in-context learning capabilities: authorship attribution and verification are performed by computing log-probabilities of stylistic consistency between texts—without any labeled data. Crucially, our method explicitly controls for topic confounding, enabling robust style isolation. Empirically, it significantly outperforms contrastive learning baselines and similarly scaled prompting approaches under controlled topic similarity. Moreover, accuracy improves consistently with model scale, demonstrating strong scalability. This work establishes a new paradigm for authorship analysis—scalable, robust, and free from bias-inducing annotations.

Technology Category

Application Category

📝 Abstract

Computational stylometry analyzes writing style through quantitative patterns in text, supporting applications from forensic tasks such as identity linking and plagiarism detection to literary attribution in the humanities. Supervised and contrastive approaches rely on data with spurious correlations and often confuse style with topic. Despite their natural use in AI-generated text detection, the CLM pre-training of modern LLMs has been scarcely leveraged for general authorship problems. We propose a novel unsupervised approach based on this extensive pre-training and the in-context learning capabilities of LLMs, employing the log-probabilities of an LLM to measure style transferability from one text to another. Our method significantly outperforms LLM prompting approaches of comparable scale and achieves higher accuracy than contrastively trained baselines when controlling for topical correlations. Moreover, performance scales fairly consistently with the size of the base model and, in the case of authorship verification, with an additional mechanism that increases test-time computation; enabling flexible trade-offs between computational cost and accuracy.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised authorship analysis using LLM pre-training and style transferability

Addressing confusion between writing style and topic in stylometry

Leveraging LLM log-probabilities for authorship attribution and verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM one-shot style transfer for authorship analysis

Unsupervised approach using LLM log-probabilities for style measurement

Performance scales with model size and computation mechanism

🔎 Similar Papers

Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges