Watermark Robustness and Radioactivity May Be at Odds in Federated Learning

📅 2025-10-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In federated learning (FL), clients inject watermark data to enable data provenance for large language models (LLMs); however, the server—acting as an active adversary—can employ robust aggregation to filter out watermark signals, creating a fundamental trade-off among radioactivity (detectability post-fine-tuning), robustness (resistance to filtering), and model utility. This paper is the first to formally characterize this three-way incompatibility. We propose a unified evaluation framework integrating radioactive watermarks, robust aggregation mechanisms, and statistical significance testing (via *p*-value analysis). Empirical results show that as little as 6.6% watermark data achieves highly significant detection (*p* < 10⁻²⁴), yet all existing radioactive watermarking schemes fail under strong robust aggregation—demonstrating a critical breakdown of provenance capability in FL under active defense.

Technology Category

Application Category

📝 Abstract
Federated learning (FL) enables fine-tuning large language models (LLMs) across distributed data sources. As these sources increasingly include LLM-generated text, provenance tracking becomes essential for accountability and transparency. We adapt LLM watermarking for data provenance in FL where a subset of clients compute local updates on watermarked data, and the server averages all updates into the global LLM. In this setup, watermarks are radioactive: the watermark signal remains detectable after fine-tuning with high confidence. The $p$-value can reach $10^{-24}$ even when as little as $6.6%$ of data is watermarked. However, the server can act as an active adversary that wants to preserve model utility while evading provenance tracking. Our observation is that updates induced by watermarked synthetic data appear as outliers relative to non-watermark updates. Our adversary thus applies strong robust aggregation that can filter these outliers, together with the watermark signal. All evaluated radioactive watermarks are not robust against such an active filtering server. Our work suggests fundamental trade-offs between radioactivity, robustness, and utility.
Problem

Research questions and friction points this paper is trying to address.

Investigating watermark robustness versus radioactivity trade-offs in federated learning
Detecting provenance tracking evasion by active adversarial servers in FL
Analyzing outlier-based watermark removal while preserving model utility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapted watermarking for federated learning data provenance
Detected watermarks remain radioactive after fine-tuning
Active server filtering removes watermarks but reduces utility
L
Leixu Huang
Georgia Institute of Technology
Z
Zedian Shao
Georgia Institute of Technology
Teodora Baluta
Teodora Baluta
Georgia Institute of Technology
Security and PrivacyMachine Learning