EmbTracker: Traceable Black-box Watermarking for Federated Language Models

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical vulnerability of federated language models to malicious client-side model leakage, a risk exacerbated by existing watermarking approaches that lack individual-level traceability and require white-box access or active client cooperation. To overcome these limitations, we propose EmbTracker—the first server-side, black-box watermarking framework enabling individual-level attribution. EmbTracker embeds unique identity markers into each client’s model during distribution, allowing precise identification of the source of any leaked model through simple API queries, without requiring client involvement. The approach integrates identity-specific embedding strategies, a backdoor-triggered black-box watermarking mechanism, and a federated model distribution protocol with built-in verification. Experiments demonstrate near-perfect (≈100%) tracing accuracy across diverse language and vision-language models, strong robustness against fine-tuning, pruning, and quantization attacks, and minimal utility degradation of only 1–2% on the primary task.

Technology Category

Application Category

📝 Abstract
Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\%).
Problem

Research questions and friction points this paper is trying to address.

Federated Language Model
Model Leakage
Client-level Traceability
Black-box Watermarking
Ownership Verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

black-box watermarking
federated language models
client-level traceability
backdoor-based watermark
model ownership verification
🔎 Similar Papers
No similar papers found.
Haodong Zhao
Haodong Zhao
Shanghai Jiao Tong University
Federated LearningLLM
J
Jinming Hu
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Y
Yijie Bai
Ant Group, China
Tian Dong
Tian Dong
Shanghai Jiao Tong University
Computer SecurityMachine Learning
W
Wei Du
Ant Group, China
Zhuosheng Zhang
Zhuosheng Zhang
Assistant Professor at Shanghai Jiao Tong University
Natural Language ProcessingLarge Language ModelsReasoningAI SafetyMulti-Agent Learning
Yanjiao Chen
Yanjiao Chen
College of Electrical Engineering, Zhejiang University
Wireless networksnetwork securityInternet of Things
H
Haojin Zhu
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
G
Gongshen Liu
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China; Inner Mongolia Research Institute, Shanghai Jiao Tong University