Residualized Similarity for Faithfully Explainable Authorship Verification

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Existing author verification methods achieve high accuracy but lack interpretability, as neural models cannot trace predictions back to original textual features. Method: We propose a residualized similarity modeling framework: a baseline similarity measure is first constructed from traditional interpretable features (e.g., lexical, syntactic, and stylometric statistics), and a neural network is then trained exclusively to model the prediction residual between this baseline and ground-truth authorship labels. This design ensures inference remains grounded in human-understandable features while leveraging data-driven refinement for accuracy gains. Contribution/Results: Our approach achieves state-of-the-art performance on four standard benchmarks, marking the first method in author verification to simultaneously deliver high accuracy, strong interpretability, and prediction faithfulness to interpretable input features.

Technology Category

Application Category

📝 Abstract

Responsible use of Authorship Verification (AV) systems not only requires high accuracy but also interpretable solutions. More importantly, for systems to be used to make decisions with real-world consequences requires the model's prediction to be explainable using interpretable features that can be traced to the original texts. Neural methods achieve high accuracies, but their representations lack direct interpretability. Furthermore, LLM predictions cannot be explained faithfully -- if there is an explanation given for a prediction, it doesn't represent the reasoning process behind the model's prediction. In this paper, we introduce Residualized Similarity (RS), a novel method that supplements systems using interpretable features with a neural network to improve their performance while maintaining interpretability. Authorship verification is fundamentally a similarity task, where the goal is to measure how alike two documents are. The key idea is to use the neural network to predict a similarity residual, i.e. the error in the similarity predicted by the interpretable system. Our evaluation across four datasets shows that not only can we match the performance of state-of-the-art authorship verification models, but we can show how and to what degree the final prediction is faithful and interpretable.

Problem

Research questions and friction points this paper is trying to address.

Improving authorship verification accuracy while maintaining interpretable features

Enabling faithful explanations for neural network predictions in authorship analysis

Residual similarity method bridges interpretable systems and neural network performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Residualized Similarity supplements interpretable features with neural network

Neural network predicts similarity residual from interpretable system

Maintains interpretability while matching state-of-the-art performance

🔎 Similar Papers

No similar papers found.