AttnDiff: Attention-based Differential Fingerprinting for Large Language Models

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of tracing the provenance of open-weight large language models after they undergo fine-tuning, pruning, or merging. To this end, the authors propose a white-box fingerprinting framework grounded in attention mechanisms. By constructing minimally edited prompt pairs that induce semantic conflicts, the method captures distinctive attention patterns reflecting the model’s internal information routing behavior. These patterns are then compressed using spectral descriptors and compared via centered kernel alignment (CKA) similarity for efficient identification. Requiring only 5–60 multi-domain probe prompts, the approach achieves high-precision source attribution: on Llama-2/3 and Qwen2.5 families (3B–14B parameters), it yields similarity scores above 0.98 for derivative models and below 0.22 for unrelated model families, substantially outperforming existing techniques.
📝 Abstract
Protecting the intellectual property of open-weight large language models (LLMs) requires verifying whether a suspect model is derived from a victim model despite common laundering operations such as fine-tuning (including PPO/DPO), pruning/compression, and model merging. We propose \textsc{AttnDiff}, a data-efficient white-box framework that extracts fingerprints from models via intrinsic information-routing behavior. \textsc{AttnDiff} probes minimally edited prompt pairs that induce controlled semantic conflicts, captures differential attention patterns, summarizes them with compact spectral descriptors, and compares models using CKA. Across Llama-2/3 and Qwen2.5 (3B--14B) and additional open-source families, it yields high similarity for related derivatives while separating unrelated model families (e.g., $>0.98$ vs.\ $<0.22$ with $M=60$ probes). With 5--60 multi-domain probes, it supports practical provenance verification and accountability.
Problem

Research questions and friction points this paper is trying to address.

intellectual property
large language models
model fingerprinting
model provenance
white-box verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

attention-based fingerprinting
differential probing
spectral descriptors
model provenance
white-box verification
🔎 Similar Papers
No similar papers found.