CAVE: Controllable Authorship Verification Explanations

📅 2024-06-24
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing offline author verification (AV) models suffer from insufficient accuracy, unreliable interpretability, and lack of post-hoc traceability—hindering deployment in privacy-sensitive applications. To address these limitations, we propose Prompt-CAVE, the first controllable explanation generation framework designed for offline AV deployment. It introduces prompt-driven silver-label data construction and a Cons-R-L consistency filtering mechanism to ensure structural uniformity, linguistic feature traceability, and strong label-explanation alignment. Built upon fine-tuned Llama-3-8B, Prompt-CAVE integrates prompt engineering, rationale-label consistency evaluation, and lightweight deployment techniques. Evaluated on three challenging offline AV benchmarks, it achieves state-of-the-art accuracy while generating high-fidelity, interpretable explanations—outperforming all baselines significantly in both automated metrics and human evaluation.

Technology Category

Application Category

📝 Abstract
Authorship Verification (AV) (do two documents have the same author?) is essential in many real-life applications. AV is often used in privacy-sensitive domains that require an offline proprietary model that is deployed on premises, making publicly served online models (APIs) a suboptimal choice. Current offline AV models however have lower downstream utility due to limited accuracy (eg: traditional stylometry AV systems) and lack of accessible post-hoc explanations. In this work, we address the above challenges by developing a trained, offline model CAVE (Controllable Authorship Verification Explanations). CAVE generates free-text AV explanations that are controlled to be (1) accessible (uniform structure that can be decomposed into sub-explanations grounded to relevant linguistic features), and (2) easily verified for explanation-label consistency. We generate silver-standard training data grounded to the desirable linguistic features by a prompt-based method Prompt-CAVE. We then filter the data based on rationale-label consistency using a novel metric Cons-R-L. Finally, we fine-tune a small, offline model (Llama-3-8B) with this data to create our model CAVE. Results on three difficult AV datasets show that CAVE generates high quality explanations (as measured by automatic and human evaluation) as well as competitive task accuracy.
Problem

Research questions and friction points this paper is trying to address.

Develops offline authorship verification model for privacy-sensitive domains
Generates controllable free-text explanations for verification decisions
Ensures explanation accessibility and consistency with verification labels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline proprietary model for authorship verification
Generates controlled free-text explanations with linguistic features
Fine-tunes small model with silver-standard training data
Sahana Ramnath
Sahana Ramnath
CS PhD student, INK Lab, University of Southern California
artificial intelligencenatural language processingmultimodal systemsdeep learning
K
Kartik Pandey
Department of Computer Science, University of Southern California
E
Elizabeth Boschee
Information Sciences Institute, University of Southern California
Xiang Ren
Xiang Ren
ExxonMobil
Computational Mechanics